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ABSTRACT 

We have constructed a comprehensive statistical model for Type la supernova (SN la) light curves 
spanning optical through near infrared (NIR) data. A hierarchical framework coherently models 
multiple random and uncertain effects, including intrinsic supernova light curve covariances, dust ex- 
tinction and reddening, and distances. An improved BayeSN MCMC code computes probabilistic 
inferences for the hierarchical model by sampling the global probability density of parameters describ- 
ing individual supernovae and the population. We have applied this hierarchical model to optical 
and NIR data of 127 SN la from PAIRITEL, CfA3, CSP, and the literature. We find an apparent 
population correlation between the host galaxy extinction Ay and the the ratio of total-to-selective 
dust absorption Ry. For SN with low dust extinction, Ay < 0.4, we find Ry f=s 2.5 — 2.9, while 
at high extinctions, Ay > 1, low values of Ry < 2 are favored. The NIR luminosities are excellent 
standard candles and are less sensitive to dust extinction. They exhibit low correlation with optical 
peak luminosities, and thus provide independent information on distances. The combination of NIR 
and optical data constrains the dust extinction and improves the predictive precision of individual SN 
la distances by about 60%. Using cross-validation, we estimate an rms distance modulus prediction 
error of 0.11 mag for SN with optical and NIR data versus 0.15 mag for SN with optical data alone. 
Continued study of SN la in the NIR is important for improving their utility as precise and accurate 
cosmological distance indicators. 

Subject headings: distance scale - supernovae: general - methods: statistical 



1. INTRODUCTION 

Type la supernova (SN la) rest-frame optical light 
curves have been of great utility for measuring fun- 
damental quantities of the universe. As standardiz- 
able candles, they were critical to the detection of 
cosm ic acceleration (|Riess et al.l 119981 iPerlmutter et al.l 
19991) . SN la have been used to constr ain the equation- 
of-sta te parameter w of dark energy (|Garnavich et all 

1998f >. a nd recent efforts have measured w to 

~ 10%. dWood-Vasev et al.l [20071: lAstier et al.l [20061: 
Kowalski et al.1 12008L iHicken et al.l I2009bb iKessler et al.1 
2009t iFreedman et al.ll2009t lAmanullah et al.ll2010[) . SN 
la have also been used to establish the extragalac- 
tic distance scale and measure the Hubble constant 
(IFreedman et al.ll200"lt Uha et al.lll999t iRiess et al J 120051 
2009aE), 

SN la distance indicators exploit empirical relations 
between peak optical luminosities of SN la and distance- 
independent measures such as light curve shape observed 
in the sample of nearby low-z SN l a (lHamuv et aJ jl 996a; 
IRiess et all [1991 Uha et all [2001 iHicken et al.1 l2009af). 
Methods include Am^JB) (lPhilliDslll993t lHamuv et al.1 
] 1199ft IPrieto et all 12006ft . MLCS 
JjlpF Uha et all 120071b "s tretch" 
(iGoldhaber et all |2001T C]VIAGIC (iWang et al.l [2 003) , 
SALT (IGuv eT^l2Qiba[2l007l) . and SiFTO (IConlev et all 
120081 ). One of the largest systematic uncertainties lim- 
iting the precision of distance estimates from rest-frame 
optical light curves is dust extinction in the host galaxy 
and the confounding of dust reddening with the intrin- 
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sic color variations of SN la (jConley et al.l [20071 ) . Cur- 
rent approaches differ conceptually and practically on 
how apparent colors, intrinsic colors, and dust effects are 
modeled. While most methods make use of the optical 
luminosity-ligh t curve width correlation, some methods , 
such as MLCS (|Riess et alJl!99^ fl998l Uha et al.l[2()07h . 
attempt to separately model the intrinsic colors of the 
SN la and host galaxy dust reddening and extinction, 
whereas othe rs model both e ffects with a single factor 
(e.g. SALT2,[GuiEitin[2003). 

Early observations o f SN l a in the infrared wer e 
made bvlKirshner et all (fl973h : lElias et all (fl98ll . fl985h : 
iFrogel et al.l ([19871 ) and iGraham et all (|1988l ) Obser- 
vations of nearby SN la in the NIR revealed that the 
peak near-infrared lumin osities of SN la have a dispersion 
smaller than 0.20 mag (lElias et all 119851: [MeiMe] 120001: 
iKrisciunas etall l2004al ldT iWood-Vasev et al.l (|2008f l 
(hereafter WV08) compiled a sample of NIR SN la obser- 
vations taken with the Peters Automated InfraRe d Imag- 
ing TELescope fPAIRITEL: iBloom et all 120061) . They 
found that the ii-band peak absolute magnitude, had 
small scatter ct(Mh) ~ 0.15 mag, and could provide dis- 
tance estimates competitive with those derived from op- 
tical light curve shapes. The effect of dust extinction is 
significantly diminished at NIR wavelengths, relative to 
the optical. The combination of optical and NIR obser- 
vations of SN la light curves could lead to eyen be tter 
estimates of SN la distances ()Krisciunas et al.ll2007l) . 

A significant source of puzzlement in the analysis of 
SN la light curves is the nature of the apparent color 
and brightness variations among supernovae, which are 
comprised of color and luminosity variations intrinsic to 
the SN la population, and also random reddening and 
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extinction by dust in the host galaxies of SN la. The 
function of dust absorption over wavelength is typically 
parameterized by the ratio of total to selective extinc- 
tion, Ry — Av/(Ab — Ay). This ratio has an aver- 
age value of 3.1 for interstellar dust in the Milk y Way 
Gala xy, although it can vary between 2.1 and 5.8 (jDraind 
2003). Studies of external gal axies have found s imilar 
extin ction curves; for example, iFinkelman et al.l (2008, 
2010) found average values of Ry ~ 2.8. Early studies of 
SN la found values of Ry < 1 (c.f. iBranch fe Tammannl 
1992. for a review) , although these analyses did not take 
into account relationships between the luminosity, color 
and light curve shap e of the events. U sing the first ver- 
sion of the MLCS (|Riess et al.l Il996al ) to model these 
relation ships for SN la optical light curves, iRiess et al.l 
(1996b) analyzed the colors of 20 nearby SN la and found 
Ry = 2.6 ± 0.3, c onsistent with the Milk y Way average. 
iTrlppl (fl99l and lTripp fc Branch! (fl999h found Ry w 1, 
but they modeled intrinsic co lor and dust r e ddeni ng as a 
single factor. More recently, iConlev et al.l (|2007t ) found 
that the relation between SN la optical luminosity and 
apparent color, controlling for light curve shape, required 
a low value of Ry w 1 — 1.7, if the total color variation 
is interpreted as interstellar dust in th e host galaxy. An 
analys is of the color curves of SN la bvlNobili fc Goobarl 
(|2"00& found Ry « 1 - 1.7. [Hicken et al.l (|2009bl> found 
that a dust absorption profile with Ry — 1.7 was fa- 
vored using the ML CS2k2 model for the CfA3 sample 
(jHicken et al.ll2009al ) . These studies assumed that a uni- 
versal color or dust absorption p rofile applied to all SN 
la. Recently. iWang et al.l (|2009af ) separately fit Ry « 1.6 
for a subset of SN la with high ejecta velocities, and 
Ry s» 2. 4 for a subset with normal ejecta velocities. 
However. Ifbley fc Kasenl (|2011f ) find Ry 2.5 for both 
subsets if the reddest SN are excluded. For individual, 
highly extinguished SN with multi-wavelength coverage, 
Ry can b e fit precisely and values of 1.5-1.8 have been 
reported dKrisciunas et al.l l2007t lElias-Rosa et al.l 120061 
120081: IWang et al.l 120081 ). It has been suggested that low 
Ry values could result from scattering by circum stellar 
dust in the local e nvironment of the supernova (jWand 

120051 IGoobaril2008h 

Contreras et al.l ()2010D recently presented the ini- 



tial sample of nearby SN la light curves observed by 
the Carnegie Supernova Project (CSP), a subset of 
which were observed in the optical and near infrared. 
iFolatelli et al.l (|2010l ) compared the apparent colors of 
the SN to a subset suspected of having no contamination 
by dust, and found a dust law slope Ry « 1.7 best fit 
the SN sample, assuming a global value of Ry. How- 
ever, when the two reddest SN were removed from the 
sample, this global value changed to Ry ~ 3.2. When 
minimizing dispersion in the Hubble diagram, they found 
R v « 1 — 2, both with or without the reddest SN. The 
different and unusual values of Ry in the literature are 
problematic for the proper analysis and interpretation of 
S N la observables and for cosmological applications. 

IMandel et al.l (|2009D presented a hierarchical Bayesian 
approach to constructing probabilistic models for SN la 
light curves. This strategy was applied to the modeling of 
the extant n ear-infrared (NIR) light c urves of SN la from 
PAIRITEL (|Wood-Vasev et al.ll200l) and the literature 
in the JHK S passbands. Using an MCMC algorithm 
(BayeSN) designed specifically for hierarchical SN la 



light curve models, they computed coherent probabilistic 
inferences for individual supernovae and the population, 
taking into account multiple sources of randomness and 
uncertainty. It was found that the variances of the peak 
absolute magnitudes were small, particularly in the H- 
band: <j{Mh) ~ 0.11 ± 0.03 mag. Since observations 
in the NIR passbands are insensitive to dust extinction, 
the estimation of host galaxy dust extinction was omitted 
from the NIR-only analysis in that paper. 

In this paper, we expand upon the hiera rchical model- 
ing ap proach for SN la first described by IMandel et al.l 
(2009), and apply it to statistical modeling of SN la 
light curves in both the optical and near infrared, in- 
cluding the effects of host galaxy dust. We describe a 
general mathematical representation of SN la light curves 
in terms of multiple decline rates over phase in different 
passbands. This Differential Decline Rates representa- 
tion is employed within a hierarchical model incorporat- 
ing multiple random effects: measurement error, pecu- 
liar velocities, dust extinction, and intrinsic variation. In 
particular, the intrinsic correlation structure of the light 
curves over phase and over wavelengths spanning optical 
to near infrared is explicitly modeled and estimated. We 
also model the joint distribution of dust extinction and 
Ry. To estimate the parameters of individual SN and 
the characteristics of the host galaxy dust distribution 
and intrinsic SN la populations, we have implemented a 
new BayeSN MCMC algorithm. This new algorithm in- 
corporates enhancements to improve efficiency and con- 
vergence in the global parameter space. 

We apply the hierarchical model to optical (BVRT) 
and NIR (JH) data of nearby SN la light curves from the 
PAIRITEL, CfA3, Carnegie Supernova Project (CSP) 
samples, and the literature. We present inferences about 
the host galaxy dust population, and the correlation 
structure of the intrinsic SN la light curve population. 
To check the fit of the hierarchical model to the sam- 
ple of data, we compare posterior predictive replications 
from the model to the observed parameter distributions 
of colors, magnitudes and light curve shapes. We quan- 
tify the utility of including NIR observations of SN la for 
improving estimates of host galaxy dust properties and 
distance predictions. Analyzing optical and NIR light 
curves within the same model, we demonstrate that dis- 
tance moduli to SN la observed with optical and near- 
infared light curve data can be predicted more accurately 
and precisely (rms = 0.11 mag) than with optical data 
alone {rms =0.15 mag). 

This paper is organized as follows: In §2, we describe 
our hierarchical Bayesian approach to constructing sta- 
tistical models for SN la light curves. In §3, we outline 
a new version of the BayeSN algorithm for computing 
probabilistic inferences with the hierarchical model using 
the supernova data. In §4, the application of the hier- 
archical model to nearby SN la data in the optical and 
near infrared is described, and posterior inferences about 
the dust and SN la light curve populations are summa- 
rized in §5. In §6, we describe checks on our model in- 
ferences. In §7 we employ cross-validation to construct 
a Hubble diagram of predicted distances to SN la, and 
demonstrate the advantages of including the NIR data 
for making more precise inferences about dust extinction 
and luminosity distances. We conclude in §8. 

In appendix <|2]we describe a non-parametric represen- 
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tation for individual light curves used by our model. In 
appendix S|B] we describe the calculation of if-correction 
and Milky Way extinction effects based on spectral tem- 
plates. Hyperprior distributions are stated in appendix 
SjCj Mathematical details of the new BayeSN algorithm 
are given in appendix ElDl 

2. STATISTICAL MODELS FOR SN IA LIGHT CURVES 

Inferences in supernova cosmology are based on sta- 
tistical models built from empirical data. The ap- 
plication of statistical models for SN la to constrain- 
ing the cosmological parameters has focused on us- 
ing information in the apparent, optical light curves 
of SN la to infer their peak luminosities and estimate 
the luminosity distances. In particular, these models 
capture empirical light curve width and color correla- 
tions with luminosity that allow SN la t o be used as 
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utility of using spectral ratios or characteristics of spec- 
tral lin es correlated with the luminosity to predict dis- 
tance s (jBailev et all 120091 : iBlondin. Mandel. k Kirshnerl 
1201 If) . These methods show promise, but need to be val- 
idated on larger spectroscopic samples to be competitive 
with light curve methods. 

The observed SN la light curve data is the result of 
the combination of multiple random effects. Different 
"normal" SN la can have intrinsically distinct absolute 
light curves, peak luminosities and colors. Each event 
can be extinguished and reddened by a different, and 
random, amount of host galaxy dust along the line of 
sight, and this dust may have different extinction laws 
as a function of wavelength for each event. Before the 
light curve is recorded by an astronomer, it is subject to 
redshift effects, absorption due to Milky Way dust, and 
measurement error. The measured redshift of each SN 
host galaxy is different from the cosmological redshift by 
a random peculiar velocity. Sensible statistical models 
for SN la light curves must account for these multiple 
random effects in the data. 

In the absence of the other effects, the apparent col- 
ors (e.g. B — V) of SN la at any phase are the sum of 
random intrinsic colors (e.g. (B — V)- lBil = Mb — My) 
and random amounts of reddening by host galaxy dust 
(As—Ay = E(B — V) ( iust)- Hence, the joint distribution 
of the apparent colors over different wavelength ranges 
and at different phases is the convolution of the intrinsic 
color distribution and the dust reddening distribution. 
Similarly, when the SN distances are known, the extin- 
guished absolute magnitudes (e.g. V(t) — fi) at different 
wavelengths and phases are the sum of random intrinsic 
absolute light curves (e.g. Mv(t)) and random amounts 
of dust extinction (e.g. Ay) over wavelength. The joint 
distribution of extinguished absolute light curves is the 
convolution of the intrinsic absolute light curve distri- 
bution and the dust extinction distribution. Since dust 
extinction only makes objects appear dimmer and red- 
der, the convolution with the dust distribution distorts 
the intrinsic distribution into the apparent distribution 
in the following ways. Clearly, the apparent distribution 
will have a dimmer and redder average light curve. The 
distribution of extinguished absolute magnitudes will be 



wider than the intrinsic distribution at any phase or 
wavelength. The dust will also induce or increase ap- 
parent positive correlations between absolute magnitude 
and color (in the redder-dimmer sense), between two col- 
ors (redder-redder), and between absolute magnitudes 
(dimmer-dimmer) at different wavelengths and phases. 
If we want to use SN la light curves to understand the 
statistical intrinsic properties of these physical events, or 
those of dust in distant galaxies, it is necessary to de- 
convolve these two effects in the observed data. 

Selecting a subsample of the observed SN, for exam- 
ple, the apparent blue end of the full sample, as repre- 
sentative of the "intrinsic" distribution, does not neces- 
sarily alleviate these distortions. Some previous stud- 
ies have selected an "unreddened" subsample based on 
auxiliary data, such as elliptical host galaxies or large 
physical separation of the SN from the center of the 
galaxy, which may suggest lack of dust extinction. How- 
ever, unless these auxiliary criteria can guarantee neg- 
ligible dust effects, the resulting subsamples may still 
be di storted if the r e is a chance for some dust extinc- 
tion. IHicken et al.l (I2009bf) showed with the large CfA3 
sample (jHicken et al.l l2009a) that SN with moderate es- 
timated dust extinction (Ay rj 0.4 mag) are found in 
elliptical host galaxies or at large projected galactocen- 
tric distances between the host galaxy and the SN. Fur- 
thermore, selecting an "intrinsic" subsample based on 
auxiliary data might distort inferences if the auxiliary 
properties are correlated with intrinsic properties, the 
distribution of which one is trying to identify. 

Statistical errors in the estimates of the random ef- 
fects can distort inferences of the intrinsic distribution. 
For example, if distances to nearby SN la are estimated 
via recession velocities and the Hubble law to infer abso- 
lute magnitudes, the effects of random peculiar velocities 
on distance errors can distort the inferred intrinsic dis- 
tributions of SN la absolute light curves. Even in the 
absence of dust, a histogram of simple point estimates of 
peak absolute magnitudes for each SN (e.g. Vq — fJ-{z)) 
will appear broader than the true, intrinsic distribution, 
P(My). Similarly, random peculiar velocities can appar- 
ently induce or strengthen a positive correlation between 
the absolute magnitudes at two wavelengths, and distort 
correlations of absolute magnitudes with other observ- 
ables if these random effects are not properly modeled. 
Measurement errors and errors in dust extinction cor- 
rections will also tend to distort joint distributions of 
inferred variables. Since extinction in the NIR is greatly 
diminished relative to the optical, inferences on the dis- 
tribution of intrinsic SN la light curve properties in the 
NIR are much less vulnerable to distortions by dust, but 
are still affected by the other sources of error. Super- 
novae far enough into the Hubble flow so that peculiar 
velocity effects are negligible will still be vulnerable to 
dust effects, especially at optical wavelengths. 

Hierarchical Bayes provides a framework for the prob- 
abilistic modeling of multiple sources of randomness and 
uncertainty. Its applicati on to statistical mod eling of SN 
la was first presented bv lMandel et a l. (2009), who con- 
structed hierarchical models for SN la light curves in the 
near infrared. The hierarchical framework provides a uni- 
fied method of inference for populations and individuals 
of those populations. It includes a population distribu- 
tion that models intrinsic variations and correlations of 
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SN la light curves, a population distribution for the the 
host galaxy dust to each SN, and models individual light 
curves, dust extinction, distances, and redshifts. Using 
Bayes' Theorem, probabilistic estimates for the unknown 
parameters of individual SN, as well as the hyperparame- 
ters of the populations, can be computed coherently and 
consistently. 

Statistical inference with hierarchical models provides 
a principled method of probabilistic deconvolution of 
physically distinct and random effects that are combined 
in the observed data. Probabilistic inference allows for 
not only the estimation of each separate effect, but also 
the exploration of the joint uncertainties and trade-offs 
between the multiple effects. It enables the estimation 
of the statistical characteristics of an underlying intrinsic 
population distribution while accounting for the distor- 
tions in the observed distribution caused by measurement 
error or other random effects. Similar issues regarding 
inferring the intrinsic distributions of inferred quantities 
in the presence of random error have been discusse d and 
specifi c Bayesian techniques hav e been applied bvlKellv l 
2001 : Kelly fc Bechtoldl (I2007T) . IHogg. Mvers. fc Bovvl 
,Eoi 



2010). lLoredo fc Hendrvl (|2010[ ). among others, in other 



astrophysical contexts. 

Statistical modeling of SN la light curves is inher- 
ently a multi-dimensional problem. Light curve obser- 
vations are essentially noisy, usually irregular time se- 
ries in multiple filters at different wavelengths. However, 
the absolute light curves exhibit regularities: for exam- 
ple, the fast declining light curves tend to be intrinsi- 
cally dimmer . Existing models for optical light curves , 
e.g. MLCS (IRiess et al.l H996aL fT998t [JhTetall l2007h. 
SALT (IGuv et all 120051120071). and Am ^fB) templates 
(|Hamuv et al.lll996bl : iPrieto et al.ll2006l ) attempt to cap- 
ture regularities by assuming strong functional forms, 
governing the "global" behavior of light curves over a 
wide range in phase and wavelength, and controlled by 
one or two parameters (e.g. A; x\,c\ Ami5(B), respec- 
tively). Although these formulations can be useful as a 
form of dimensionality reduction to project gross varia- 
tions in high-dimensional data onto a small-dimensional 
latent parameter space, it is not clear that this reduction 
can be done cleanly without loss of statistical informa- 
tion contained in the light curves. Detailed studies of 
well-sampled light curves reveal that, for example, SN 
with the same Ami^(B) measurement (the magnitude 
change in the i?-band light curve after 15 days from the 
peak) can display significant differences in their multi- 
band light curves oyer a range in phase (Folatcll i et al.l 
I2010t iHoflich et ail 12010ft . signifying that a single light 
curve shape parameter does not capture the full variety 
of light curve signals that are generated by the under- 
lying explosion physics. Furthermore, these global pa- 
rameters lack direct interpretability: even with a contin- 
uously well-sampled SN light curve, it is impossible to 
estimate the A parameter without knowing the particu- 
lar templates that attempt to project it onto the latent 
parameter space. 

In this paper, we take a different approach to modeling 
the light curves. Instead of adopting a strong param- 
eterization of the global behavior of the absolute light 
curves, we take a non-parametric approach that mod- 
els the shape of the light curves "locally." This does 
not mean that there are no parameters; rather, we use 



local parameters, describing the variations in signal in 
each neighborhood of phase and wavelength, to build up 
a model for the light curve over the full range of phase 
and wavelength. We develop a Differential Decline Rates 
model (i j2.ll $A| to represent the light curves in filters at 
multiple wavelengths using the decline rates over inter- 
vals in phase in each passband. Regularities underlying 
the population of light curves are then captured, not by 
one or two global parameters, but by inferring the cor- 
relations between the local parameters in the training 
set of well-sampled SN. From this perspective, the light 
curves are modeled as stochastic processes with covari- 
ance structure over phase and wavelength that must be 
estimated. The correlation structure in the light curves is 
modeled in the intrinsic population distribution for SN la 
light curves. Although there may be many local parame- 
ters, they are not each statistically independent once the 
correlation structure of the population is learned. In- 
deed, the intrinsic dimensionality of the light curves (i.e. 
the effective number of "global" degrees of freedom) is 
implicit in the estimated covariance structure, and does 
not need to be fixed a priori. By incorporating this non- 
parametric light curve model in the hierarchical Bayes 
framework, we can coherently estimate the joint uncer- 
tainties in the correlation structure and incorporate them 
into distance predictions for SN la. 

The probabilistic hierarchical approach also provides a 
principled framework for dealing with missing data. The 
observations are typically not obtained at an exactly reg- 
ular cadence: observation times often can be random or 
clustered, with gaps in temporal coverage due to weather 
or instrumentation. The SN in a given sample may not 
all be observed in the full set of passbands; in this pa- 
per, the SN are observed in the optical filters, but only a 
subset are observed in the NIR. The Bayesian approach 
deals with this by marginalizing over the unobserved light 
curves in the posterior distribution, thus incorporating 
this lack of information into inferences without omitting 
good incomplete data, which would be necessary if the 
analysis required the entire data set to be complete. In 
the absence of complete data, the model makes the best 
estimates and predictions given the available observed 
data. 

We have bu i lt upo n the basic framework described by 
iMandel et al.l (|2009ft . The overall structure of the hierar- 
chical Bayesian model is depicted by Fig. [T] We describe 
each component of the model in turn. 

2.1. Representation of Apparent Light Curves 

An apparent light curve model at phase t in rest-frame 
filter F with parameters Fq and 6 F = (<?L'"nl) ^ s § en " 
erally described by 

hC F (t;F o ,0 F )=F o + l F (t,0 F ) 



= F + l F (t; + ; f (*! #nl) ' 



where Fq is the apparent magnitude at t — in rest- 
frame filter F, and I (t; F ) is the normalized light curve 
in filter F, so that ^ F (0) = 0. The vector of linear 
light curve shape parameters is 9 F , and is a vec- 
tor of non-linear light curve shape parameters. The 
phase is defined in the rest-frame of the SN, with t = 
corresponding to the time of maximum light in _B, To: 
t = (T — Tq)/(1 + z), where z is the measured redshift, 
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Fig. 1. — Hierarchical framework for statistical inference with 
SN la light curves. The global posterior density of the hierar- 
chical model parameters given the full SN data set is represented 
formally with a directed acyclic graph. Unknown parameters are 
represented by open nodes. Observed data (redshifts z and mea- 
sured light curves 23) are represented by shaded nodes. Each ar- 
row or link describes a relationship of conditional probability. The 
hierarchical model coherently incorporates randomness and uncer- 
tainties due to measurement error (purple), intrinsic SN variations 
(green), dust extinction and reddening (red), peculiar velocities 
and distances (blue) into inferences about individual SN and the 
population. The graph can be understood as a generative model for 
the data. "SN la AbsLC Pop" represents parameters describing the 
population of SN la light curves, including intrinsic variations and 
correlations in shape, color and luminosity across multiple wave- 
lengths. From this population, each SN randomly draws a set of 
multi- wavelength light curves "AbsLC" . The box "Dust Pop" rep- 
resents parameters governing the population distribution of host 
galaxy dust values. Each SN randomly draws dust parameters 
Av,Rv from this distribution. These dust parameters combine 
with the individual absolute light curves and distance modulus to 
generate an apparent light curve "AppLC," which is sampled with 
noise to produce the observed multi-wavelength light curve data 
23. In the nearby universe, the distance modulus is related to the 
observed recession velocity or redshift through the Hubble law plus 
a noise term representing random peculiar velocities of host galax- 
ies. This random generative process is conceptually repeated for 
each SN in the data set. The difference between "training" and 
distance prediction is that the latter does not condition on the 
redshift-distance likelihood information of the SN (bottom). 

and T is the time of observation. For a multi- wavelength 
model using light curve observations corresponding to 
rest- frame filters F 1 , . . . , F N , the apparent light curves 
of a SN are described by a vector of apparent light curve 
parameters 



<p = (F \...,F N ,e F \...,e F ") 



(2) 



and the time of maximum light in B, Tq. 

The models employed in this paper do not use 
non-linear shape parameters, so F — 9 F . To specify 
the light curve functions l F (t) and l F (t), we construct 
a representation in terms of differential decline rates, 
as described in Appendix In this representation, 

the light curve shape parameters 6 F for each filter are 
simply the changes in magnitude over disjoint intervals 
in phase. Let d F be a vector of decline rates of a 
light curve in filter F on a grid in phase, set to t = 
(-12, -8, -4, -2, 0, 2, 4, 6, 8, 10, 12, 15, 18, 23, 30, 37.5, 45) 
days. The decline rates are positive after maximum light 
and negative before peak in each filter. For a given set 
of decline rates 6 F = d F , the normalized light curve for 
each filter at arbitrary phase is constructed with with 



a smooth curve defined by non-parametric regression 
cubic spline. In this representation the light curve 
function l F (t) = 0, and is determined by a linear 
smoothing spline: LC F (t; F , 6 F ) = F + l({t) ■ 8 F . 

2.2. Likelihood Function for Apparent Light Curves 

In this section, we describe the likelihood function for 
the apparent light curve model parameters, conditional 
on the the observed light curve data. The likelihood 
function explicitly accounts for if-corrections from the 
rest-frame filter to the observer-frame filter, Milky Way 
extinction, and photometric measurement error. 

A light curve measurement in the observer frame O 
at time T is m° . This observation differs from the ap- 
parent light curve model in the rest-frame through K- 
corrections (to account for the redshifting of the SN spec- 
trum), Milky Way extinction, and measurement error. 
At each redshift, we construct a unique mapping between 
each observer-frame filter O and a rest-frame filter F (in 
this paper, 0,F e {B, V, R, I, J, H}). 

m° = kc OF (t; z, <t>) + gx OF (t; z, 0, E MW ) 

+ LC F (t;F ,9 F )+e. 

The if-correction for the redshift z supernova magni- 
tude at phase t from rest-frame filter F to observer 
frame filter O is kc OF (t; z, <p). The if-correction has 
a dependence on the supernova spectral energy distri- 
bution (SED), and this is modeled as a function of ap- 
parent color. Thus, it depends on the apparent light 
curve parameters <p only through the apparent model 
colors at the same phase. For example, at low redshifts 
z < 0.05, if the observer-frame filter is O = B, then 
the rest- frame filter is F = B, and the dependence of 
kc OF (t; z, <p) on the model light curve parameters (p is 
through the apparent color B — V at phase t. The ef- 
fective Milky Way extinction, gx OF (t; z, <p, Emw), also 
depends on the SN SED through the colors, and is also a 
function of the estimated color excess due to Milky Way 
dust, Emw = E(B — V)mwi which i s obtained from the 
ISchlegel. Finkbeiner. fc Davisl (|1998D dust maps. Details 
regarding the calculation of if-corrections and Milky 
Way extinction for SN la are presented in Appendix 
SjB] The variance of the random error term e includes 
photometric error, and estimated uncertainties in K- 
corrections and Milky Way extinction. 

For each measurement in observer frame filter O at 
observed time T, we can write down Eq. [3] relating the 
measurement to the rest-frame light curve model. Let the 
observations in filter O be arranged into a time-ordered 
vector m , with each observation listed from earliest to 
latest. The corresponding equations can be also be time- 
ordered. If the supernova light curve is observed in mul- 
tiple filters, Oi, . . . , On, we can arrange the full data in 
time- filter ordering, so that m = (m 0l 7 . . .m° N ), with 
the observer frame filters ranked from shortest to longest 
central wavelength. With this arrangement, a time- filter 
ordered vector equation can be written for each SN: 

m = KC(T ; z, 0) + GX(T o ; z, 0, E M w)+L 2 {T , z)<j>+e. 

We suppress the explicit dependence on the known obser- 
vation times T. Each of the terms depends on the time 
of maximum Tq, and the time-dilating redshift through 



() 



the phase t = (T — Tq)/(1 + z). Each element of the 
vectors KC and GX corresponds to the i-T-correction or 
Milky Way extinction scalar in Eq. [3l Here L2 is the 
unique matrix that, when multiplied with the apparent 
light curve parameter vector d>, computes the rest-frame 
apparent light curve model corresponding to each time- 
filter ordered observation in m. Its rows are constructed 
from the individual vectors li(t). 

If the random errors e are normally distributed, the 
likelihood function of the unknowns To, <p for the full 
data set for a single SN is 



P(m\T ,<j>,z) = 

N[m\ KC(T ; z, 0) + GX(T ; z, <$>, E MW ) 
+ L 2 (T ,z)4>,W] 



(5) 



where W is the error covariance matrix, and N(x\fi, S) 
is a multivariate Gaussian probability density with mean 
vector fi and covariance matrix X. In general, W can 
be a full positive definite symmetric matrix, if the errors 
due to photometric calibration, i<T-corrections, and Milky 
Way extinction are correlated across the observer filters 
and observation times. Our algorithms allow for this to 
be a full matrix, but for this paper, we take the simple 
approach of assuming it is diagonal, using only the mea- 
surement error variances. We symbolize the light curve 
data, the observed magnitudes and times, and the error 
covariance for each SN, as V s — {m, W, T}. 

2.3. Redshift- Distance Likelihood Function 

The theoretical relation between the cosmological red- 
shift z c and the luminosity distance 4 to a SN in 
a smooth cosmological model depends on the cosmo- 
logical parameters f2jvf ,&A,w and the Hubble constant 
h = H o /100 km s _1 . At low redshifts, distances are in- 
sensitive to the cosmological model, and if we are con- 
cerned only with ratios of distances (or differences of dis- 
tance moduli), then it is sufficient to fix h. For this paper, 
we are not concerned with constraining cosmological pa- 
rameters, but on the statistical properties of SN la light 
curves, so we fix CIm = 0.73, fl\ = 0.27, w = —1 and h = 
0.72. The cosmological redshift may differ from the mea- 
sured redshift z through measurement error and random 
peculiar velocities. The expected value of the distance 
modulus at red shift z is f(z) = 25+ 5 log 1Q [dj J (z) Mpc" 



As described in lMandel et al.l (|2009f ) . the likelihood func- 
tion of the distance modulus given the measured redshift 

P( M | Z) = N\M I /(*), °2 = [/'(*)] V* + ^pec/c 2 )]. (6) 

In the low- 2 regime, where dj^(z) is linear in z (the Hub- 
ble law), the variance is 



z In 10 



pec 



(7) 



where is the redshift measurement variance and <7p 0C 
is the expected variance due to random peculiar veloci- 
ties. In this paper, we have alternately taken cr pcc = 150 
(|Radburn-Smith et al.l l200l and 300 km s^ 1 . O ur re- 
sults are consistent between the two values. We used the 
measured redshifts corrected to the cosmic microwave 
background frame and the local infall flow model of 
IMould et all (|2000h . 



In this paper, we are only concerned with evaluating 
distance predictions for low-z SN la, so these param- 
eters are fixed to their concordance values. The de- 
pendence of the redshift-distance relation on the cos- 
mological parameters could be made explicit by writing 
P(p\ z; fijif, ^Aj w) and allowing them to be free parame- 
ters that appear in the global posterior density (Eq. flT)l) 
of a cosmological sample of SN la. 

2.4. Latent Variable Model and Host Galaxy Dust 

The vector cf> in Eq. [5] encodes the information needed 
to construct the apparent light curve model in the rest- 
frame filters. Using the differential decline rates repre- 
sentation, this vector encodes the peak apparent magni- 
tudes and the decline rates of the apparent light curve in 
multiple filters over intervals in phase. The latent param- 
eter vector ip encodes the information for constructing 
the absolute light curve model in the rest-frame filters: 
the peak absolute magnitudes in rest frame filters, and 
the decline rates of the absolute light curves. The two 
sets of parameters are related by host galaxy dust ex- 
tinction and distance: 



<f> = ip + A + vfi. 



(8) 



The vector A = Ay (a + (3/Ry) represents the effect of 
extinction on the absolute light curve parameters, and 
is a function of the host galaxy extinction, Ay, and 
the slope of th e extinction law, Ry , using t he du st ex- 
tinction law of lCardelli. Clayton, fe Mathisl (1989). We 
m odel the effect of host galaxy extinction as described 
in lJha. Riess. fc Kirshne r (2007). For a given (Ay,Ry), 
the effective dust extinction in filter F at phase t is 



A F (t) = AvC F (t)(a F + b F /R v ). 



(9) 



The coefficients a F and b F model the effect of dust ex- 
tinction on the supernova SED within each passband F 
at the time of maximum light. The functions £ F (t) model 
the change of this effect with phase due to the evolving 
supernova SED. The constant vector a is constructed 
with components 

!a F , if 4>j is a peak magnitude, Fq 

A£pCL F , if 4>j is a decline rate in filter F 
between phases and Tk+i 

(10) 

where A(p = [Cf^+i) — ( F ( T k)]- The constant vector 
(3 is defined analogously, in terms of £ F and b F . This 
accounts for the effect of dust extinction on the apparent 
magnitudes and light curve shape through the evolving 
SN la SED with phase. 

Since distance only changes the magnitude, but not the 
shape of the light curve (after accounting for time dila- 
tion), the constant vector v is defined with components 



1, if <j)j is a peak magnitude, Fq 
0, otherwise 



(11) 



With these constructions, we use Eq. [8] to relate the ap- 
parent light curve parameters <fi to the latent variables of 
extinction Ay, Ry, the distance modulus \x, and absolute 
(intrinsic) light curve parameters 



V> = (M 



Pi 



,M Fn ,8 Fi ,...,§ Fn ) 



(12) 
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where 6 F contain the decline rates of the absolute model 
light curves in each rest-frame filter. Since the model 
for the extinction in each passband and phase, Eq. [9l 
is linear in the inverse of Ry, we find it useful to define 
ry = Ry 1 to simplify the notation. 

2.5. Population Distribution Model for Intrinsic 
Absolute Light Curves 

Even normal SN la do not all have the same lumi- 
nosities, intrinsic colors, or light curve shapes. The het- 
erogeneities of these properties in the population of SN 
la - which cannot be explained by dust or distance - 
are called intrinsic variations. Estimation of the covari- 
ances in the population of SN la light curves is crucial 
to the utility of SN la as standardizable candles for dis- 
tance estimation. For example, the well-know n width- 
luminosity correlation of optical light curves (IPhillipsi 
[1991 lHamuv et al"]|1996bt iPhillips et aT1H999l ) allows us 
to estimate the distance modulus to a SN to ~ 0.2 mag. 

To model and capture intrinsic correlations between 
the absolute magnitudes at multiple wavelengths and the 
shapes of their light curves, we need to specify a gen- 
eral correlation structure for the population distribution 
of ip. Ideally, the population distribution P(ip) would 
be specified by reliable astrophysical theory. However, 
current explosion models for SN la do not provide such 
detailed guidance regarding the expected distribution of 
absolute light curve properties. Thus, we seek to model 
the population distribution of light curves generally, and 
infer the statistical properties of the intrinsic variations 
from the data. We capture the intrinsic variations and 
correlations of SN la absolute light curves by modeling 
the distribution of the intrinsic parameters ip as a mul- 
tivariate Gaussian: 



it is simple to estimate xp u given ip°. The full hyperpa- 
rameters can be partitioned in the same way: 



xp s ~ N(n^j, £</,) 



(13) 



with mean vector fi^ and intrinsic covariance matrix 
S^,. The intrinsic covariance matrix models popula- 
tion correlations between the peak absolute magnitudes 
at different wavelengths, correlations between the light 
curve decline rates at different phases and wavelengths, 
and correlations between peak absolute magnitudes and 
light curve decline rates at different phases and wave- 
lengths. By capturing the population correlations of 
the absolute light curves in multiple filters at differ- 
ent wavelengths, we implicitly also model the correla- 
tion structure of the intrinsic colors. The intrinsic co- 
variance matrix can be readily decomposed into a ma- 
trix of intrinsic correlations, R^ and a vector of intrin- 
sic standard deviations er^, one for each component ip 1 : 
X.0 = diag(<r^)i?^ diag(er^). Each element of R^ is a 
correlation coefficient between —1 and 1. A valid 
must be positive definite and symmetric. Since ip di- 
rectly describes the multi-band absolute light curves, Eq. 
[TBI models them as a stochastic (Gaussian) process. 

Suppose the intrinsic parameters vector ip = (ip u ,ip°) 
of a particular SN can be partitioned into parameters ip° 
tightly constrained by its observations, and parameters 
ip u that are not directly observed. For example, ip° may 
be the decline rates describing the shapes of this super- 
nova's light curves, and ip u may be the peak absolute 
magnitudes of this SN under distance prediction. With 
this population model for intrinsic light curve variations, 



/.,u\ /x^uu x^uo 

., — l l v — f ^ J V' 

— I ,,o h ^i/) — I \T>ou Voo 



(14) 



Using standard theorems for the multivariate normal dis- 
tribution, the expected value of ip u , conditional on ip° 
is E[^\xP°] = v% + E^E" )- 1 ^ - and its 

conditional covariance (uncertainty) is Cov[ip u \ip°] — 
- S^fS™)-^™. This example demonstrates how 
this model uses the correlation structure of the abso- 
lute light curves to relate inferred variables to observable 
quantities, and vice versa. 

In the absence of host galaxy dust {Ay — 0) and 
measurement error, an estimator for the predicted dis- 
tance can be derived straightforwardly for well-sampled 
light curves. Suppose that apparent light curve parame- 
ters <p could be measured perfectly for well-sampled light 
curves with vanishing measurement error. If the intrin- 
sic mean fi^p and covariance S^, were known, then the 
posterior prediction of the distance modulus has mean 
A = ^ vT ^ 1 (<P- MvO and variance = (i> T £^ 1 t>)~ 1 . 
In fact, fi is the minimum variance unbiased linear es- 
timator of the distance modulus, a result that does not 
depend on the Gaussianity of the intrinsic distribution 
of ip, the absolute light curves parameters. This can be 
shown by noting that p, is the generalized least squares 
solution of Eq. [5] and invoking the Gauss-Markov theo- 
rem. However, the presence of a random amount of host 
galaxy dust for each supernova, and finite sampling and 
measurement error of the light curves necessitates mod- 
eling these other aspects of the hierarchical structure. 

Modeling intrinsic variations of ip in the population us- 
ing the covariance structure of a multivariate Gaussian 
is the simplest choice. However, if non-Gaussianities be- 
come important then it will be possible to replace this 
simple assumption with more complex distributions. For 
example, a Gaussian mixture model could be used to 
describe a multi-modal population, Student-i distribu- 
tions can be employed to model fat-tailed populations, 
and non-linear regression could capture non-linear corre- 
lation structure. Alternatively, one might seek a repre- 
sentation or parameterization ( §2.1[) for ip that makes its 
population distribution more amenable to modeling with 
a simple forms. For the application in this paper, we did 
not find these more advanced approaches to be necessary, 
so we postpone their discussion for future work. 

Observables that are not derived from light curve 
data, such as host g alaxy masses (e.g. iKellv et aLll2010l : 
Sullivan et al.ll2010f ) or spectroscopic measurements (e.g . 
Bailev et all 120091 [Blondin. Mandel. fc Kirshnerl I2011L 



Foley fc Ka scn 2011), can be correlated with the intrin- 
sic absolute light curves. They can be included in this 
framework by augmenting ip with an auxiliary parame- 
ter, and specifying a likelihood function describing the 
uncertainty in the new observable. The joint distribu- 
tion P(ip) would model the covariance structure of the 
intrinsic light curves along with the auxiliary property, 
which can be leveraged to compute distance predictions 
using the extra information. 

2.6. Population Models for Host Galaxy Dust 



We also adopt models for the population distribu- 
tion of host galaxy dust parameters Ay and ry for 
each SN. Their joint population distribution can be fac- 
tored as P(A v ,r v ) = P(r v \ Ay)P(Ay). The extinc- 
tion Ay values are assumed to be drawn from an expo- 
nential distribution descr ibing dust along lines of sigh t 
from SN host galaxies (|Jha. Riess. k Kirshnerl 120071 ): 
Ay ~ Expon(T J 4). The probability density is 



P(A v \t a ) = 



-A v /t a 







Ay > 
Ay < 



(15) 



with an unknown hyperparameter, the exponential scale 
length ta, which is inferred from the hierarchical poste- 
rior probability density conditional on the data. 

Even along lines of sight within the Milky Way, inter- 
stellar dust can cause varying amounts of reddening for 
a given amount of absorption or extinction. This ratio 
is captured by the parameter Ry = Ay /E(B — V). Al- 
though the average value within the Milky Way is 3.1, 
it can range from 2.1 to 5.8, and depends on the na ture 
of the dust grains (|Cardelli et alJfl989t IDraindl2003l and 
references therein). 

Previous studies have focused on the treating Ry as 
a constant for all supernovae, either set to the Milky 
Way value, or left as a fit parameter. However, this is a 
rather strong assumption, so here we consider the possi- 
bility that the Ry of dust within the distant host galax- 
ies of SN la may vary within a common distribution. We 
also wished to explore whether Ry could be systemati- 
cally different for SN with different Ay dust extinctions. 
To test for and capture potential population correlations 
between Ay and ry, we consider several models for the 
conditional population distribution ry = Ry 1 given Ay. 
We consider six cases with the following assumptions: 

1. Ry =3.1. The host galaxy dust law slope is fixed 
to the Milky Way interstellar average for all SN. 

2. CP: (Rv — const). Complete pooling; Ry has 
the same value for each SN, but this value is un- 
known and inferred from the posterior density. 

3. NP. No pooling. The {ry} for each SN are com- 
pletely independent with a flat prior P(r v ) oc 1 on 
each. 

4. PP: m = 0. Partial pooling. The {r v } are condi- 
tionally independent draws from a common Gaus- 
sian population distribution independent from the 
magnitude of extinction Ay . 



(16) 



The mean [i r and variance of of this population are 
unknown and inferred from the posterior density. 

5. PP: m = 1. Partial pooling. The {r v } are condi- 
tionally independent draws from a common popu- 
lation distribution with a mean linearly dependent 
on Ay. 

T$\A v ~N(Po + 0iAlr,o*) (17) 

The regression coefficients (3 and residual variance 
of of this population are unknown and inferred 



from the posterior density. The intercept /3q rep- 
resents the population mean ry value at vanishing 
extinction, and /3i captures a potential trend of ry 
with increasing Ay. 

6. PP: Steps. Partial pooling with the step function 
distribution. The range in Ay is divided up into 
two or four intervals (c.f. Tables [3] k \5§ . Within 
each interval, the r v for each SN is a conditionally 
independent draw from a common Gaussian dis- 
tribution with mean /x r and variance <r~ (Eq. I16p . 
These hyperparameters are estimated from the pos- 
terior density. 

In all cases, we limited Ry to an allowed range: 0.18 < 
r v < 0.7 (1.4 < Ry < 5.6). 

Cases 4, 5 and 6 perform partial pooling, or shrinkage 
estimation, of the ry parameters, which is characteris- 
tic of hierarchical Bayes models. In the complete pooling 
case, it is assumed that the dust in the host galaxies of 
SN all have the same value of Ry, and thus all the infor- 
mation in the sample of SN is used ("pooled") to infer 
the single Ry value. In the case of no pooling, it is as- 
sumed that the host galaxy dust for different SN can have 
different, independent values of ry, and that only the in- 
formation from each SN is used to infer the Ry value 
for that SN. These two cases are limits of partial pooling, 
in which the information for each SN is combined with 
that of the population to produce the individual Ry es- 
timates. The appropriate weight between the individual 
SN information and that of the population is negotiated 
by the inferred population variance, of, and the preci- 
sion with which Ry can be estimated independently for 
each SN. As of — > 0, we obtain complete pooling, and 
for of — > oo, we have effectively no pooling. At inter- 
mediate values, partial pooling finds a middle ground 
between the noisy and possibly unstable estimates of no 
pooling, and the possibly unrealistic strong assumptions 
of complete pooling. The hyperparameter of. can be un- 
derstood as the residual variance of ry in the sample after 
accounting for the other sources of error for each individ- 
ual SN. Shrinkage accounts for the fact that a histogram 
or scatter plot of individual, simple, point estimates of 
derived quantities will be wider than the true, intrin- 
sic distribution of those quantities, if those point esti- 
mates have significant uncertainties, and it reduces the 
mean squared error of each parameter estimate. From 
a non-Bayesian perspective, shrinkage can be regarded 
as an adaptive regularization that determines from the 
data how much to allow an individual estimate to deviate 
from the population average or trend. Shrinkage estima- 
tio n with multi-level model s has been discussed recently 
bv lLoredo k Hendrvi pOlOl ). 

We can test the hypothesis that the dust law parameter 
ry has no dependence on host galaxy extinction Ay by 
comparing the results from fitting Cases 4 and 5. Since 
Case 4 is a nested case of Case 5 with /3\ — 0, we can 
check to see whether or not the marginal posterior den- 
sity of /?i is consistent with zero when fitting Case 5. 
Similarly, with Case 6, we can check to see whether the 
population means \x r in each interval in Ay are consis- 
tent with each other across the range of Ay, or if there 
are significant differences. 

For brevity and specificity, in subsequent sections de- 
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scribing statistical computation, we adopt the assump- 
tions of Case 5; the hyperparameters governing the ry 
population are (3 and of. For other cases, the hyperpa- 
rameters are /x r and of and the algorithms are appropri- 
ately modified to account for the different models. 



2.7. Specifying the Hyperpriors 

Diffuse, or non-informative, hyperpriors are adopted 
on the highest-level hyperparameters of the hierarchical 
model: ^,S^ for the SN light curve population and 
Ta, \ir or (3, and of for the host galaxy dust popula- 
tion. As the number of well-observed supernovae, iVgN, 
becomes larger, the influence of the hyperpriors on the 
posterior estimates of the hyperparameters diminishes. 
Thus, so long as we include a sufficient number of SN 
in the hierarchical model, we can choose the particular, 
analytic forms of diffuse hyperpriors for computational 
convenience. We discuss some mathematical details in 
the appendix fEfCj). 



2.8. Global Posterior Probability Density 

iMandel et al.l (|2009f ) derived the global posterior prob- 
ability density over all the SN in the training set as a 
product of the light curve and redshift-distance likelihood 
functions for individual SN, population distributions for 
SN la light curves and host galaxy dust, and the hyper- 
priors on the hyperparameters of the population distri- 
butions. We construct the global posterior using the new 
component probability models described above. Let the 
time-filter ordered light curve data in the observer frame 
for SN s be T> s , with measured redshift z s . The unknown 
parameters for an individual SN are the apparent light 
curve parameters <p s , the distance modulus /i s , the ex- 
tinction A v , the slope of the dust law r v , and the time 
of maximum Tq. For a given set of population hyper- 
parameters, fJ»ip, ta, (3, of, the conditional posterior 
density for an individual SN is 



P(4>s,Tq,h s , A v , r s v | V s , z s ; fi^, H^,t a ,{3, of) 
oc P(m s \T^,(p s ,z s ) x P(n s \z s ) 
x P(ip s = <p s - vfi s - A 8 \ix^,T,^) 
xP(Ay,r v \T A ,f3,a 2 r ). 



(18) 



The training set data is comprised of the light curve 
data for all the SN in the training set V = {T> s } and 
their redshifts Z = {z s }. The unknown hyperparame- 
ters of the populations are the mean and covariance of 
the absolute light curve parameters fi^ , £,/, , the expo- 
nential scale of the extinction distribution ta, and the 
hyperparameters describing the Ry 1 distribution. The 
global joint posterior density of all supernova observables 
{<p s }, distance moduli {^ s }, dust parameters {A v ,r v }, 
and the population hyperparameters conditioned on the 
database V, Z is proportional to the product of Asn in- 



dividual densities multiplied by the hyperpriors: 
P({4> s , T S , fi s , A s v ,r v }; S^, t a ,(3, of | £>, Z) 

Y[P(m s \T°,4> s ,z s ) x P(n s \z s ) 

. s=l 

x P(%j) s = <p s - vfi s - A s \n 1 p,H 1 p) 



x P(A s v ,r v \T A ,f3,a 2 r ) 



x P(/x v ,,S v ,) x P(t a , (3,0-1). 

(19) 



To predict the distance of a SN using its light curve data, 
one sets the redshift-distance likelihood P(jj. s \ z s ) oc 1 in 
Eq. [T51 where we use tilde on parameters and data for 
prediction set SN. The tilde redshift z s is used for time- 
dilation and if-correction in the light curve likelihood 
function, but not to constrain the distance modulus in 
the redshift-distance likelihood. The marginal posterior 
predictive density is P(fJ- s \ D s , z s ; T>, Z), obtained by in- 
tegrating Eq. [TjJ] 

Equation [T5] is an explicit statement of the objective 
function for statistical inference for both training and 
prediction with the hierarchical model. By computing 
Eq. [T5J we solve for the most likely distributions of host 
galaxy dust and intrinsic luminosities, colors and light 
curve shapes that best account for the apparent distri- 
butions, conditional on the model assumptions. A di- 
rected acyclic graph (DAG) representing the hierarchical 
model and the global posterior density is shown in Fig- 
ured] The graph depicts a generative probabilistic model 
linking together the populations and indivi duals, and pa- 
rame ters and hyperparameters to the data ( Mandcl et al. 
2009). For simplicity, we have not shown the depen- 
dence of the light curve likelihood function on redshift 
through time-dilation and if-correction, and only show 
the redshift-distance dependence, which is the key differ- 
ence between training and prediction. 

3. IMPROVED MCMC WITH BAYESN 

It is important to distinguish between the tasks of sta- 
tistical inference and statistical computation. The for- 
mer entails deriving estimators for unknown quantities, 
conditional on data and the assumptions of the statisti- 
cal model. We have done that in the previous section by 
deriving the global posterior probability density, Eq. If 91 
The task of statistical computation is accomplished by 
specifying, constructing, and implementing algorithms 
for computing the numerical values of these estimators 
for the observed values of the data, under the assump- 
tions of the model. In this section, we describe our strat- 
egy for statistical computation of the posterior estimates 
by stochastic sampling from the global posterior proba- 
bility density, Eq. [Tj|] 

We perform fully Bayesian inference of the hierarchi- 
cal model using Markov Chain Monte Carlo (MCMC) 
sampling methods such as the Metropolis-Hastin gs al- 
gorithm (iMetropolis et al.l 119531: jHastingsl 11970ft and 
Gibbs sampling (|Geman fc Gemanlll984| ). IMandel et al.l 
(2009) presented the first MCMC algorithm (BayeSN) 
for hierarchical Bayesian inference with supernova light 
curves. We have made many modifications to the original 
BayeSN algorithm to incorporate the modeling of host 
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galaxy dust, X-corrections from the rest- frame model 
light curves to the observer-frame measured magnitudes, 
and to improve the computational speed and convergence 
rate of the chains. The new algorithm is largely com- 
prised of Gibbs sampling and Metropolis- Hastings jumps 
that do not require the user to "tune" jump sizes for each 
SN, as would be required for ordinary Metropolis steps. 
Instead, the algorithm only requires the specification of 
a jump size for the scalar time of maximum for each SN, 
Tq , which is a relatively easy task: a rms jump proposal 
of ~ 0.5 days was generically successful for rapid conver- 
gence for all SN. Movement along the other dimensions 
of parameter space is accomplished by Gibbs sampling or 
more general Metropolis-Hastings proposals that exploit 
the conditional structure of the posterior distribution to 
adaptively propose more global moves. By minimizing 
the amount of manual tuning required before running the 
MCMC, we have increased the ease-of-use and practical 
applicability of MCMC for fitting comprehensive hierar- 
chical models for SN la. A basic introduction to MCMC 
al gorithms for SN la probabilistic inference was g iven 
bvlMandel et al.l (120091); stat i stical references inc l ude | L iul 
(2002): lGelman et al.l (|2003f ): iRobert fe Casellal (|2005l ). 

We use this new BayeSN code to sample from the 
global posterior probability distribution of all individual 
parameters and the population hyperparameters of SN 
la light curves and host galaxy dust, conditioned on the 
supernova light curve data and redshifts. Here, we sketch 
the improved BayeSN Gibbs sampling algorithm. Fur- 
ther mathematical details can be found in Appendix £ lDl 
The state of the chain is determined by the current values 
of all the parameters and hyperparameters: 

S = ({cj> s ,T°,Li s ,A s v ,r s v },ii %b ,'E^,T A ,f3 1 a 2 r ). (20) 

We generate random samples from the global posterior 
distribution P(S\T>, Z) using a Markov chain, condi- 
tioned on the photometric light curve data for all SN 
V = {T> s }, and their redshifts Z = {z s }. We begin with 
crude, randomized guesses of the individual supernova 
parameters {</> s , TJ, A v ,r v } for all supernovae s in 
the data set. In each step, we update a subset, or block, 
of parameters from their conditional posterior density 
with the complement set of parameters (and the data) 
fixed to their current values. The choice of parameter 
blocks exploits the conditional independence structure 
of the directed acyclic graph of the inference, Fig. [1] 

1. Compute the absolute light curve param- 
eters {if> s } using Eq. [8] The condi- 
tional posterior density of the light curve 
hyperparameters is P{^, E,/,| •, V, Z) = 
P(^|S V ,;{^ S })P(S V ,|{^ S }). We update 
from the second factor, and then update fj,^ given 
E^ from the first (see $D]for details). (We use the 
notation (•) to indicate all the other parameters 
and data that have not been explicitly indicated.) 

2. Draw a new extinction scale hyperparameter ta 
from the conditional distribution P(ta\ ~,D,Z) = 
P(ta\ {A v }), by drawing a random number from 
an inverse gamma distribution. 

3. Obtain new values of (3 and 
a 2 from P{(3,a 2 \-,V,Z) 



P((3\al{r v ,Ay})P(a 2 r \{r v ,Ay}). First draw 
a new a 2 from an inverse gamma distribution. 
Conditional on that value, draw a new (3 from a 
Gaussian. 

4. Repeat the following steps for each supernova s. 

(a) Propose a new time of maximum T * s ~ 
N(To yS , s T ) according to a random walk from 
the previous estimate. It is usually sufficient 
to use st ~ 0.5 days. Given Tq , propose 
a new set of apparent light curve parame- 
ters for all filters, <fi* , from the distribution in 
EjDj Compute the Metropolis-Hastings ratio 
r to decide whether to accept the joint pro- 
posal (Tq , </>*) or to stay at the current values 

(b) Update the new distance modu- 
lus fi s from the conditional prob- 
ability density P(fi s \ -,V S , z s ) = 
P(fj, s \4> s ,A v ,r v ;fj,^,'E^;z s ), by sampling 
from a Gaussian distribution. 

(c) Draw a new host galaxy extinction 
Ay from the conditional posterior 
probability density P(A V \ ■ ,V S , z s ) = 
P(A v \cf> s ,fj, s , r v ; fjLj, E^, t a ,P, cr 2 ). This 
can be shown to be a truncated Gaussian 
distribution in A v > 0. 

(d) Draw a new host galaxy dust 
law slope r v from the condi- 
tional posterior P(r v \-,T> s ,z s ) = 
P{r v \(t> s ,Vs,A v ] ^^,13,01) by eval- 
uating the pdf on a fine grid on 
0.18 < r y < 0.7 and using g riddy Gibbs 
sampling ([Ritter fc Tannerlll992f ). 

(e) (optional) Perform a translation in the space 
of distance versus extinction: (Ay,fi s ) — > 
(Ay , / u s )+7(l, —x). Here, x determines a ran- 
dom direction along the trade-off. We select a 
rando m 7 using generalized conditional sam- 
pling (Liu 20Q2|) , and then translate to update 

(Ay, [Is). 

5. Finally, record the current state of the chain S. 
Repeat all steps until convergence. 

This algorithm generates an irreducible and ergodic 
Markov chain that will converge to a stationary distri- 
bution equal to the the global posteri or density. Eg. [T9l 
accor ding to standard theorems (e.g. IRobert fc Casellal 
2005]). It converges without Step 4e, but this step speeds 
the exploration of the trade-off between extinction and 
distance for each SN. We found that including this step 
reduces the auto-correlation scale for the slowest con- 
verging A v by a factor of ~ 3. 

4. APPLICATION 

4.1. Data Sets 

In this section, we describe the application of the hi- 
erarchical framework to inference with nearby SN with 
optical and NIR light curve observations (BVRIJH). 
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iMandel et all ()2009D analyzed a sample of SN with near 
infrared JHK S light curve o bservations compiled from 
the the PAIRITEL sample (|Wood-Vasev et all [2008h . 
and available published light curves from the literature 
(IJha et al.lll999tlHernandez et al . 2000; Krisciunas et al.l 
20001 120011: IDi Paola et al.l I200a IValentini et al.l 1200c 
J I2003L I2004blld . 
lo et all I2007T [l 



Krisciunas et al 
20061: IPastorello 



2007; El ias-Rosa et al.l 
Stanishev et al. | 2007 



Pignata et al.ll2008fl . Nearly all of the SN in the PAIR! 



TEL sample we re also observed in op tical filters U BVRI 
or UBVr'i' bv IHicken et al.l (|2009a|) . and nearly all of 
the SN from the literature were also observed at optical 
wavelengths. We s elected the same set of SN with NIR 
light curves used in IMandel et al.l ()2009l ) with the follow- 
ing exceptions: SN 2005ao was omitted for lack of qual- 
ity data points, and SN 20061f was omitted because its 
position was near the Galactic plane, so its Milky Way 
dust reddening estimate was very large and unreliable. 
Extensive studies of two of the well-sampled supernovae, 
SN 2005cf an d SN 2006X, were presented bv lWang et all 
((2008L [2009bh . 

Since the number of SN la with NIR light curves is still 
small, we supplemented this se t with SN from the r ecent 
CfA3 sample of nearby SN la (jHicken et al.ll2009a|) that 
had well-sampled optical light curves. The additional 
light curves increase the statistical strength in the opti- 
cal bands and stabilize estimation of the full hierarchi- 
cal model. They also provide a set of optical-only light 
curves to compare against the set of SN with optical and 
NIR light curves ($7|). We included SN with high quality 
light curve data at 0.01 < z < 0.065, with more than five 
observations in B band, and with the first observation in 
B occuring less than 10 days past maximum. Since these 
light curves lack NIR data, the model marginalizes over 
the missing light curves for all inferences. Some of the 
SN in the CfA3 sample were observed in the RI pass- 
bands, while some were observed in the r'i' passbands. 
In both cases, we map these observer-frame passbands 
to rest-frame RI passbands, and in the latter case, the 
^-correction takes into account the cross-filter transfor- 
mation. The if-corrections for observations in B and V 
filters mapped them to rest-frame B and V filters. The 
J and H observations were mapped to rest-frame J and 
H filters defined by 2MASS. 

We use only normal SN la with B decline rates 0.75 < 
Am 15 (B) < 1.6. As discussed bv IHicken et al.1 (|2009af l. 
fast decliners and SN 1991bg-like objects have different 
luminosity-light curve shape relations than normal SN la, 
and should be modeled separately, so we do not include 
them in our analysis. We ha ve also excluded th e peculiar 
SN 2006bt from the sample (|Folev et al.ll2010D . The full 
"CfA-l- literature" sample consists of 110 supernovae, 37 
of which have NIR observations. 

We modeled the rest-frame BVRIJH light curves of 
this set of SN listed in Table [U For each SN, if there was 
data in an observer frame passband mapping to a given 
rest-frame filter, we list the fitted peak apparent magni- 
tude in the rest-frame filter. If there were no observations 
for a given rest-frame passband, then the estimated peak 
magnitude is not listed. We h ave omitted ultraviol et data 
for the analysis in this paper. Kessl er et al.l ([2009) found 
that differences in the £/-band model between MLCS2k2 
and SALT2 lead to large (Aw ~ 0.2) differences in the 
cosmology using high redshift samples. As this work fo- 



cuses on low redshift data, we have omitted the U data to 
avoid calibration and standardization problems, selection 
effects, and iS-corrections between the u and U bands. 
Future work, applying our framework to high redshift 
SN la samples for cosmological analysis will carefully in- 
c orporate E/-band data. 

iContreras et al.l (|2010f ) recently published light curve 
observations for a set of nearby SN as part of the 
Carnegie Supern ova Project, and an an alysis of this data 
was discussed bv iFolatelli et al.l (|2010t ). These are high- 
quality well-sampled light curves with optical coverage, 
and a subset included contemporaneous NIR observa- 
tions as well. Comparison of the CSP and PAIRITEL 
data reduction and calibration for SN observed in both 
samples is ongoing. For joint analysis of the current pub- 
lished photometry of these sets, in $7j we augment our 
sample with 27 SN with CSP data passing the criteria 
and examine the distance predictions for each set as a 
consistency check. Twenty of these SN have joint optical 
and NIR light curve observations. 

4.2. Statistical Computation 

We ran the new BayeSN code (Q to coherently fit 
the apparent light curves with the Differential Decline 
Rates model f q2.1|) . to estimate host galaxy dust extinc- 
tion f £|2.4p and distance moduli f H2.3p . and to infer intrin- 
sic variations and correlations f £|2.5p and the host galaxy 
dust population (i j2.6p . The code samples the global pos- 
terior density (Eq. I19|) over all unknowns given the data. 

We seeded each chain with random, initial values of the 
SN and dust parameters {(f> s , A v , /j, s }. In many cases, 
the B light curve was sufficiently well-sampled that the 
time of maximum could be unambiguously determined. 
In these cases, we fixed the To to that value and did 
not re-estimate it when running the sampler (by setting 
st = in step 4a) . For SN with more uncertain To we re- 
estimated it during the fitting (st = 0.5d). Under Case 
1 f H2.6p . the initial values of Ry were all fixed to 3.1. For 
all other ry cases, the initial {r v } were randomized. 

The initial dust configuration of the SN set is ran- 
dom, and we do not make an initial estimation of the 
A v value before running BayeSN. Thus, it is possible 
that this randomized initialization of dust values may 
assign a high A v to an apparently blue SN, and a low 
Ay to an apparently red object. If we see that multiple 
chains starting with different and random initial dust 
configurations eventually converge to the same posterior 
estimate, we can be reassured that our final inferences are 
independent of the initial assignments of Ay,ry values, 
and that the probabilistic inference has sorted out the 
probable dust values over the set of SN. In Figure [2] we 
show that four independent and parallel chains training 
the hierarchical model over the set of SN, each starting 
with a different initial value for the host galaxy extinc- 
tion, converge to the same final estimates in the long 
run of the MCMC. This shows that our final inferences 
for the trained model are robust to the initial values of 
host galaxy dust, and indicates the self-consistency of 
estimates and convergence of the algorithm to a unique 
joint solution over the full set of SN. 

To perform training and prediction, the BayeSN code 
generated four parallel, independent chains of 2 x 10 4 
complete cycles (Steps 1-5). The initial values for each 
chain were generated by using different random numbers 
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BayeSN MCMC Convergence 




MCMC Sample 



MCMC Sample 



Fig. 2. — Example sample paths of Markov Chain Monte Carlo 
(MCMC) chains generated by the BayeSN MCMC sampling code. 
The full chain stochastically samples the parameter space of all 
individual SN in the set, and the populations of SN la light curves 
and the dust. This plot focuses on the coordinates of the chain 
concerning the visual extinction Ay to particular SN. Each color 
represents an independent chain starting from a randomized initial 
guess. The chains explore the full parameter space and converge 
within a few hundred iterations upon the same global posterior 
distribution. The posterior uncertainty in the estimate is reflected 
in the distribution (variability) of the chain samples upon con- 
vergence. The plot depicts the simultaneous convergence of the 
chains, both for the estimate of a single SN, and for estimates of 
the ensemble of SN, ensuring the attainment of a consistent global 
solution for the SN population. Each color represents one of four 
independent chains. For example, the blue line in each panel is a 
different coordinate (projection) of the same MCMC chain. 

for each independent chain. We thinned out the chains 
by recording only every 40th value. This reduces the au- 
tocorrelation between successive recorded samples and 
saves memory. To asses s convergence, we comp uted the 
Gelman-Rubin statistic (|Gelman fc Rubinlll992D for each 
parameter in the chain to compare the coverages of the 
independent chains. We considered a maximum G-R ra- 
tio less than 1.10 to indicate convergence. We discarded 
the first 20% of each chain as burn-in, and the chains 
were concatenated for analysis. 

5. RESULTS: POSTERIOR INFERENCES 

In this section, we report the posterior inferences of 
light curves and the population when the training set 
consists of all the SN and their rcdshifts (T>,Z). We 
report the posterior inference obtained when adopting 
Case 5 (m = 1) for the (Ay, ry) population model, 
which models linear trends between the dust slope ry 
and the dust extinction Ay. Posterior inferences can be 
described in terms of light curve fits and dust estimates 
for individual SN, intrinsic covariances in the population 
of SN light curves, and the population distribution and 
correlations of host galaxy dust properties. 

5.1. Individual Supernovae 

Optical and NIR light curve fits in the rest-frame 
are shown for one supernova, SN 2005eq, in Fig. [3l 
The points are the measured magnitudes in the observer 
frame minus the estimated if-corrections and Milky Way 
extinction in each passband. The black curves represent 
the fitted apparent light curves in each rest-frame pass- 
band, with each light curve represented by the differential 
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Fig. 3.— (top) Optical (CfA3, [Hicken et al.l I2009al) and NIR 
(RAIRITEL. IWood-Vasev et al.l200a) observations of nearby Type 
la SN 2005eq are fitted with a multi-band light curve model. The 
points are the observed magnitudes in each filter minus estimated 
^-corrections and Milky Way extinction, (bottom) Optical and 
NIR light curves of SN 2005eq are used to infer the host galaxy 
dust extinction properties. The hierarchical model enables coher- 
ent inference of host galaxy dust properties (Ay, Ry) (assuming a 
CCM dust law), while marginalizing over the posterior uncertain- 
ties in the dust and SN light curve populations. The cross indicates 
the marginal bivariate mode, and the two black contours contain 
68% and 95% of the posterior probability. The inferred NIR extinc- 
tion An i s much smaller than the optical extinction Ay and has 
much smaller uncertainty. This SN exhibits moderate extinction 
and reddening due to host galaxy dust. 

decline rates model The peak apparent magnitudes 

for each SN and the decline rate Amis(B) are listed in 
Table H 

We also depict the posterior inferences of the dust 
properties: the visual extinction Ay, the NIR extinc- 
tion, Ah, and the slope of the extinction law ry = Ry 1 ■ 
The bivariate marginal probability densities were esti- 
mated from the MCMC samples using kernel density es- 
timation. The marginal distributions integrate over the 
posterior uncertainties in individual light curve fits and 
the population distribution. For SN 2005eq, we find a 
moderate amount of visual extinction, Ay ~ 0.3 mag. 
We can see from the side-by-side comparison that not 
only is the i/-band extinction about five times smaller, 
but its uncertainty is also much smaller. 

Since dust extinction is nonnegative, Ay > 0, the 
posterior probability densities of the dust parameters is 
highly non-gaussian for SN with low extinction. For ex- 
ample, from Table IH we infer that SN 2006ax has little 
host galaxy dust extinction with the most likely value 
being Ay = 0.01 mag. However, it is uncertain enough 
that Ay — 0.12 still lies within 68% highest posterior 
density (HPD) contour. By contrast, the Ah estimate is 
near zero, and the 68% contour lies within Ah < 0.03. 
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Even SN with low extinction benefit from observations in 
the -ff-band by reducing the uncertainty in the dust es- 
timate. Table 2] lists summary statistics of the marginal 
posterior distribution of each host galaxy dust parameter 
for each SN, obtained from the MCMC samples. 

5.2. Intrinsic Correlation Structure of SN la Light 
curves in the Optical-NIR 

We use the hierarchical model to infer the intrinsic 
correlation structure of the absolute SN la light curves. 
This correlation structure captures the statistical rela- 
tionships between peak absolute magnitudes and decline 
rates of light curves in multiple filters at different wave- 
lengths and phases. We summarize inferences about light 
curve shape and luminosity across the optical and near 
infrared filters; a more detailed analysis of the intrin- 
sic correlation structure of colors, luminosities and light 
curve shapes will be presented elsewhere. 

5.2.1. Intrinsic Scatter Plots 

The hierarchical model fits the individual light curves 
with the differential decline rates model and infers the 
absolute magnitudes in multiple passbands, corrected for 
host galaxy dust extinction. For each individual SN light 
curve, we can use the inferred local decline rates d F to 
compute the Ami5(F) of the light curve in each filter. In 
the left panel of Figure 0] we plot the posterior estimate 
of the peak absolute magnitude Mb versus its canoni- 
cal Ami5(i?) decline rate with black points. The error 
bars reflect measurement errors and the marginal uncer- 
tainties from the distance and inferred dust extinction. 
This set of points describes the well-known intri nsic light 
curve decline rate versus luminosity relationship (Phillips 
I1993D . We also show the mean linear relation between 
M B and Am 15 (B) found by IPhillips et all (|1999D . who 
analyzed a smaller sample of SN la. The statistical trend 
found by our model is consistent with that analysis. The 
red points are simply the peak apparent magnitudes mi- 
nus the distance moduli, Bq — /i, which are the extin- 
guished peak absolute magnitudes Mb + Ab- Whereas 
the range of extinguished magnitudes spans ~ 3 magni- 
tudes, the intrinsic absolute magnitudes lie along a nar- 
row, roughly linear trend with Am\^(B). 

In the right panel, we plot the intrinsic and ex- 
tinguished absolute magnitudes of SN la in the H- 
band. In contrast to the left panel, the differences 
between the intrinsic absolute magnitudes and the ex- 
tinguished magnitudes are nearly negligible. Notably, 
there is no correlation between the intrinsic Mh in 
the NIR and optical Am^jB). Thi s was noted previ- 
ously by [Krisciunas et al.l (|2004aD and lWood-Vasev et al.l 
(2008). The standard deviation of absolute magnitudes 
is much smaller in H than in B, demonstrating that 
the NIR SN la light curve s are good standard can- 
dles (iKrisciunas et al-l feOCMallct iWo od-Vasev et "all 120081: 
iMandelet al.l l2009D. Theoretical models of lKasenf( 2006 s ) 
indicate that NIR peak absolute magnitudes have rela- 
tively weak sensitivity to the input progenitor 56 Ni mass, 
with a dispersion of ~ 0.2 mag in J and K, and ~ 0.1 
mag in H over models ranging from 0.4 to 0.9 solar 
masses of 56 Ni. The physical explanation may be traced 
to the ionization evolution of the iron group elements in 
the SN atmosphere. 
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Fig. 4. — (left) Post- maximum optical decline rate Ami5(B) ver- 
sus posterior estimates of the inferred optical absolute magnitudes 
Mg (black points) and the extinguished magnitudes Bo — A 1 ( re d 
points). Each black point maps to a red point through optical 
dust extinction in the host galaxy. The intrinsic light curve width- 
luminosity Phillips relation is reflected in the trend of the black 
points, indicating that SN brighter i n B have slow e r decl ine rates. 
The blue line is the linear trend of [Phillips et al. (1999). (right) 
Inferred absolute magnitudes and extinguished magnitudes in the 
near infrared iJ-band. The extinction correction, depicted by the 
difference between red and black points, is much smaller in H than 
in B. The absolute magnitudes Mjj have no correlation with the 
Ami5(B). The standard deviation of peak absolute magnitudes is 
also much smaller for Mjj compared to Mg . 

These scatter plots convey some aspects of the popu- 
lation correlation structure of optical and near infrared 
light curves that is captured by the hierarchical model. 
In the next section, we further discuss the multi-band 
luminosity and light curve shape correlation structure in 
terms of the estimated correlation matrices. 

Figure [5] shows scatter plots of optical-near infrared 
colors (B — H,V—H,R—H,J — H) versus absolute mag- 
nitude (Mb, My, Mr, Mb) at peak. The blue points are 
the posterior estimates of the inferred peak intrinsic col- 
ors and absolute magnitudes of the SN, along with their 
marginal uncertainties. Red points are the peak apparent 
colors and extinguished absolute magnitudes, including 
host galaxy dust extinction and reddening. These plots 
show correlations between the peak optical-near infrared 
colors and peak optical luminosity, in the direction of in- 
trinsically brighter SN having bluer peak colors. In con- 
trast, the intrinsic J — H colors have a relatively narrow 
distribution, and the near infrared absolute magnitude 
Mh is uncorrelated with intrinsic J — H color. 

5.2.2. Intrinsic Correlation Matrices 

Using the hierarchical model, we compute posterior in- 
ferences of the population correlations between the dif- 
ferent components of the absolute light curves of SN la. 
This includes population correlations between peak ab- 
solute magnitudes in different filters, p(Mp, Mf>), cor- 
relations between the peak absolute magnitudes and 
light curve shape parameters (differential decline rates) 
in different filters, p(Mp,d F ), and the correlations be- 
tween light curve shape parameters in different filters, 
p(d F ,d F ). They also imply correlations between these 
quantities and intrinsic colors. This information and its 
uncertainty is captured in the posterior inference of the 
population covariance matrix of the absolute light 
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Fig. 5. — Inferred absolute magnitudes Mp (blue points) and 
the extinguished magnitudes Fq — fi (red points) versus colors rela- 
tive to NIR H (intrinsic: blue points; apparent: red points). Only 
SN with complete BVRIJH data are plotted. The intrinsically 
optically bright SN tend to be intrinsically bluer in optical-NIR 
color. The H-band absolute magnitudes have no trend with intrin- 
sic J — H colors, which have a comparatively narrow distribution. 
Note that the magnitude and color axes have the same scale in 
each panel. 

curve parameters {ip s }- The posterior estimate of the 
absolute light curve population integrates over the pos- 
terior uncertainties in the individual light curves and the 
host galaxy dust estimates. 

In Figure El we have distilled some of the information 
in this intrinsic covariance matrix to show the inferred 
intrinsic correlations. For brevity, instead of depicting 
correlations with every differential decline rate df, we 
only show correlations with the canonical 15-day post- 
maximum decline rate in each filter, Amis (J 71 ). The cor- 
relations range from -1 to 1 and are color coded accord- 
ing to strength. The joint uncertainties of the correla- 
tions are computed but not shown. The bottom matrix 
shows the posterior inferences of the correlation matrix 
of peak absolute magnitudes. The optical luminosities 
and light curve shapes are strongly correlated with each 
other, but not with the NIR. The J and H luminosities 
are strongly correlated with each other, but not with the 
optical. Since the NIR luminosities have low intrinsic 
correlation with the optical luminosities, they provide 
independent information on the distance. 

The top left matrix shows the posterior inferences of 
correlation matrix of the Ato 15 decline rates and the peak 
absolute magnitudes in each hlter. The decline rates 
Amis in BVR exhibit strong correlations with peak op- 
tical absolute magnitudes, but they show low correlation 
with peak NIR absolute magnitudes in J and H . The 
decline rates Amis in IJH exhibit little correlation with 
luminosities in any of the optical or near infrared filters. 

The top right matrix shows the posterior estimates of 
the correlations between the Ami5 light curve decline 
rates in each filter. The correlation matrix exhibits a 
band structure, with the largest correlations neighboring 
the diagonal. The decline rate Amis in a particular hl- 
ter is typically most strongly correlated with the Amis 
in filters at neighboring wavelengths. This can be seen 
by examining each row of the correlation matrix. The 
optical decline rates in B and V are strongly or moder- 
ately correlated with each other, but have low correla- 
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Fig. 6. — The intrinsic correlation structure of optical-near in- 
frared peak absolute magnitudes Mp and decline rates Ami5(F). 
The color scale encodes the strength of the correlation between 
any two intrinsic quantities. The marginal mode for each corre- 
lation coefficient is depicted, integrating over the uncertainties of 
individual SN. The posterior uncertainties in the correlations are 
computed but not shown, (top left) Correlations between peak 
intrinsic absolute magnitude and decline rate, (top right) Corre- 
lations between decline rates for each pair of passbands. (bottom) 
Correlations between peak intrinsic absolute magnitudes for each 
pair of filters. 

tion with NIR decline rates. Similarly, the decline rates 
in IJH show strong or moderate correlation with each 
other, but have lower correlation with the decline rates 
in B,V. The lack of strong correlation across the whole 
matrix indicates that the light curve shapes across opti- 
cal and NIR wavelengths are unlikely to be adequately 
modeled with one degree of freedom. 

These matrix plots depict some of the salient popula- 
tion correlation information of the SN la absolute light 
curves captured by the hierarchical model. This inferred 
correlation structure is used by the model to estimate 
luminosities from the light curves and to make distance 
predictions. 

5.3. Posterior Inference of the Host Galaxy Dust 
Population 

5.3.1. Linear Correlation Dust Population Model 

In this section, we describe posterior inferences for the 
host galaxy dust population. From the samples of the 
global posterior, Eq. \T9\ we can estimate the host galaxy 
dust extinction, Ay, and the slope of the extinction law, 
ry = 1/Ry, for each object and their uncertainties from 
their marginal distributions. We also estimate the char- 
acteristics of the dust population through the hyperpa- 
rameters, ta,/3, and a%, while accounting for global un- 
certainties. 

In Figure we display the histogram of the Ay es- 
timates for each SN in the sample, along with the indi- 
vidual marginal estimates and their uncertainties. This 
is compared against an exponential probability distribu- 
tion with the marginal estimate of the extinction scale 
ta = 0.37 ±0.04. There appears to be an overabundance 
of object at Ay > 1.5 compared to the exponential dis- 
tribution. We explore this further in 

In Figure [51 we display the estimates of (A y, R y) for 
the m = 1 population model, described in £12.61 This 
model assumes that the mean trend of ry versus Ay is 



15 




Fig. 7. — The distribution of inferred host galaxy dust extinc- 
tion Ay. The hierarchical model estimates the extinction to each 
SN using the optical and near infrared light curves, and models 
the dust population, (top) The Ay estimates and uncertainties of 
each SN ranked from highest to lowest extinction, (bottom) The 
histogram of the modal Ay estimates plotted against the fitted 
exponential distribution for the dust population. 

linear in Ay. Fitting the hierarchical model then entails 
computing posterior estimates of (Ay, ry) for individual 
objects and the population trend, parameterized by /3, 
of. For SN at low Ay, the ry parameter for each in- 
dividual SN cannot be estimated precisely, since it only 
enters into the extinction model, Eq. |9l and thus, the 
likelihood, multiplied by Ay. For these SN, there is not 
enough information in individual light curves to distin- 
guish between the individual ry estimates, and so the 
model pools them towards the group mean or trend. At 
high Ay, the ry parameter can be estimated more pre- 
cisely for each SN, so they can be individually distin- 
guished. In the top panel, we show the Ay, Ry values for 
each SN for three joint samples from the MCMC chain. A 
joint sample of {A v , R v }, (3, of represents a single prob- 
able realization of these parameters given the data, and 
is labelled by a single color. The Ry estimates at low 
Ay show considerable scatter between samples, reflect- 
ing the underlying uncertainty. At high Ay, there is less 
scatter between individual SN and between samples, re- 
flecting the increased precision for estimating Ry . In the 
bottom panel, each point and error bar represents the 
marginal estimate, averaging over all the MCMC sam- 
ples, of (Ay,Ry) for each SN. 

In Figure [SJ we show the bivariate marginal probabil- 
ity density of the regression parameters (3 — (/?o,/3i), 
obtained from the MCMC samples. The joint mode, and 
the 68% and 95% highest posterior density contours are 
shown. The intercept /3o represents the population mean 
value of ry at vanishing Ay — > 0, and fi\ represents the 
population mean linear trend of ry against Ay. Also 
shown is the value of ry corresponding to the Milky Way 
interstellar average Ry = 3.1. The intercept /3n at van- 
ishing Ay is uncertain, but consistent with the Milky 
Way average within la. The regression slope j3\ is pos- 
itive with zero excluded from the 95% credible region. 
The marginal estimates of each of the regression param- 
eters are listed in Table [TJ The marginal estimate of 
/3o = 0.35 ± 0.05 can be compared against ry — 0.32 for 
the Milky Way average. The characteristic value of Ry 



TABLE 1 

Prediction Errors for Ry scenarios 



Assumptions 


Inferred 


Opt. 


Opt+NIR 


on Ry population 


Hyperparameters 


[mag] 


[mag] 


Ry = 3.1 




0.20 


0.13 


Complete Pooling 


Ry = 1.6 ±0.1 


0.15 


0.13 


No Pooling 




0.16 


0.12 


PP: m = 


Hr 1 = 1.7 ±0.1, 


0.16 


0.12 




o r = 0.04 ± 0.02 






PP: m = 1 


A) = 0.35 ±0.05 


0.15 


0.11 




ft =0.15 ±0.03 








oy = 0.04 ± 0.02 






PP: 4-Steps 


c.f. Table [3] 


0.15 


0.11 



Note. — Optical and Optical+NIR rms prediction errors at 
cz > 3000 km s _1 for different dust population models. Esti- 
mates of hyperparameters are the marginal posterior means and 
standard deviations. The rms prediction errors are the 0.632 boot- 
strap cross-validation estimates. Sampling variance of prediction 
errors is typically ±0.01 mag. 

as Ay — > 0, fi^ 1 , is uncertain because of the difficulty of 
determining Ry for low-extinction objects. The marginal 
posterior density of fl^ 1 has a non-gaussian profile: the 
mean is 2.9, the mode is 2.7, and the interval contain- 
ing 68% of the highest probability density is [2.3,3.3]. 
The marginal probability that Pq 1 < 2 is p — 0.02. The 
marginal estimate of the slope is /3i = 0.15 ± 0.03. This 
is a strong indication of a differential trend of ry vs. Ay 
in the host galaxy dust population of nearby SN. 




4- P(R" 1 |A V )= N < R v I °-36+0.14A v , of=0.04 2 ) - 




Fig. 8. — Apparent correlation between host galaxy dust visual 
extinction Ay and the dust law slope Ry in the sample of SN 
la. This model assumes a dust population where Ry has a linear 
trend with Ay with some rms scatter a r . The linear regression 
coefficients and residual scatter (f3,a^:) are estimated from the 
marginal global posterior distribution, (top) The points of each 
color and the regression relation are different probable realizations 
of the (Ay, Ry) for each SN and dust population hyperparameters, 
/3 and of , obtained from snapshots of the MCMC . The Ry esti- 
mates at low extinctions have more uncertainty than those at high 
extinction, as reflected by the scatter of points with different col- 
ors, (bottom) Averaging over all probable realizations, we plot the 
inferred marginal posterior mode of (Ay,Ry) and their marginal 
uncertainties for each SN with the marginal estimates of the re- 
gression model. When the individual Ry estimates for single SN 
are very uncertain, they tend to be pulled toward the population 
mean value (for its extinction Ay) using partial pooling. The data 
favor an apparent non-zero correlation between Ay and the dust 
slope Ry 1 . SN la light curves with low to moderate extinction are 
consistent with the Milky Way average Ry ~ 3.1 for interstellar 
extinction, but for highly extinguished SN, a low value of Ry < 2 
is favored. 
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TABLE 2 

Inference for "2-Step" Ry Population Model 



0.25 0.3 0.35 0.4. 0.45 0.5 0.55 
P Q (intercept R y as A y -» 0) 

Fig. 9. — Marginal posterior distribution of the linear regression 
coefficients of the dust population model assuming a linear mean 
trend between Ry and Ay. The parameter /3i is the slope of Ry 

against extinction Ay, and /3rj is the population mean value of Ry 1 
in the low extinction limit. The two-dimensional mode is marked, 
and the inner and outer solid black lines contain 68% and 95% 
of the marginal probability, respectively. The posterior estimate 
of /3rj is consistent with Ry = 3.1 (vertical red dashed line) and 
inconsistent with Ry < 2, P(/3o > 0.5) = 0.02. The posterior 
estimate of the regression slope j3j is extremely inconsistent with 
zero (horizontal dashed line). 

These results were consistent when changing the pe- 
culiar velocity dispersion a pcc from 150 to 300 km s _1 . 
The posterior mean of fi^ 1 was 2.8, the mode was 2.5, 
and the 68% interval containing highest probability den- 
sity was [2.1,3.3]. The marginal estimate of the slope is 
fix = 0.14 ± 0.04. 

Figure [10] plots the posterior estimate of the inferred 
optical reddening E(B — V) = As — Ay due to host 
galaxy dust versus the estimated dust extinction Ay, 
assuming the linear correlation model. The reddening 
estimates at Ay > 1.5 favor a Ry — 1.7 reddening law, 
whereas at lower extinction, Ay < 1, the reddening esti- 
mates are consistent with 2.4 < Ry < 3.1. 

5.3.2. Step Function Dust Population Model 

The linear correlation model, m = 1, assumes that the 
mean trend of ry with Ay is linear across the entire range 
of Ay. However, we do not know if this assumption is 
true. To test the sensitivity of the apparent differential 
trend in ry vs. Ay to the linear correlation assump- 
tion of the m — 1 model, we fit alternate models using 
the "Step" Case 6 of $?M Instead of using all the SN 
over the range of Ay to determine a linear correlation 
in the host galaxy dust population, this case groups to- 
gether only SN in the same interval of Ay to determine 
their group mean and variance of ry = Ry 1 in each bin. 
We have re-trained the hierarchical model using the step 
function assumptions, first by dividing the range in Ay 
into "high" (Ay > 0.8) and "low" (Ay < 0.8) extinction 
bins. Second, we subdivided those bins and re-trained 
the model using four bins in extinction, Ay. 

Marginal posterior estimates for the "2-step" model are 
listed in Table[5J The low extinction bin, with Ay < 0.8, 
has a group Ry mean of 2.3 ± 0.3, while the high extinc- 
tion bin, Ay > 0.8 has a group Ry mean of 1.7 ± 0.1. 
We examined the marginal posterior probability of the 



Ay Range 


Mr 1 




ay 


Ptail 


[0.0.8] 
> 0.8 


2.3 ±0.3 
1.7 ±0.1 


0.45 ±0.06 
0.59 ±0.04 


0.04 ±0.02 
0.06 ±0.03 


0.008 



Note. — The hypcrparamctcr fi r is the population mean 
Ry 1 for SN in each interval in Ay. The hyperparameter tr^ 
is the population variance of Ry 1 in each interval. Estimates 
are marginal posterior means and standard deviations. The 
marginal posterior density of cr r is highly non-Gaussian. The 
estimate of fi~^ is not precisely the inverse of the estimate of 
/V, because of uncertainty and the non- linear transformation. 
The marginal probability that fi r is larger than fi r of the high 
extinction bin is ptail- 

difference between the group means /i r at low and high 
extinction. This calculation takes into account the pos- 
terior covariance between the estimates and marginal- 
izes over uncertainties in the other parameters. The tail 
probability that /i r of the low extinction bin is greater 
than \x r of the high extinction bin is denoted ptaili and 
is computed directly from the MCMC samples. We find 
less than 1% probability that the difference is positive, 
suggesting that the difference is significant. 

For the "4-step" model, the posterior inferences of the 
hyperparameters in each of the four intervals in Ay are 
listed in Table [3] The group means, \i r , for each interval 
display the same trend of lower Ry for higher Ay . The 
bin with < Ay < 0.4 has a group mean consistent 
with Ry sa 3, the interstellar average for the Milky Way. 
However, the group mean for the lowest extinction SN is 
uncertain due to the difficult of determining ry at low 
Ay. The marginal posterior density of the chracteristic 
Ry in the lowest extinction bin, , is non-gaussian: the 
mean is 2.9, the mode is 2.5 and the interval containing 
68% of the highest probability density is [2.1,3.3]. The 
marginal probability that /i" 1 of the lowest extinction 
bin is less than 2 is p — 0.04. The highest extinction bin, 
Ay > 1.25, favors a group mean Ry = 1.6 ± 0.1. We 




Dust Extinction A,, 



Fig. 10. — Marginal posterior estimates of inferred color excess 
E(B — V) due to host galaxy dust versus inferred extinction Ay, 
assuming the linear correlation model. This model assumes a dust 
population where Ry has a linear trend with Ay with some rms 
scatter u r The SN la at lower extinction (Ay < 1) have implied 
color excesses consistent with Ry = 2 to 3. The SN la at higher 
extinction favor a dust law with Ry < 2. 
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TABLE 3 

Inference for "4-Step" Ry Population Model 



Ay Range fi r 




<Jr 


Ptail 


[0,0.4 
[0.4,0.8 
[0.8.1.25 
> 1.2E 


2.9 ±0.7 
2.3 ±0.3 
2.1 ±0.2 
1.6±0.1 


0.35 ± 0.08 
0.45 ± 0.06 
0.48 ± 0.04 
0.63 ± 0.03 


0.03 ±0.02 
0.03 ± 0.03 
0.03 ±0.03 
0.04 ±0.03 


< 0.001 

0.005 
0.004 



Note. — The hypcrparamctcr i± r is the population mean R v 
for SN in each interval in Ay. The hypcrparamcter a^. is the pop- 
ulation variance of Ry 1 in each interval. Estimates are marginal 
posterior means and standard deviations. The marginal posterior 
density of oy is highly non-Gaussian. The estimate of fi^ 1 is not 
precisely the inverse of the estimate of [i r , because of uncertainty 
and the non-linear transformation. The marginal probability that 
the mean of each bin is larger than [i r of the highest extinction 
bin is ptail- 

calculated ptail, the marginal probability that the group 
mean \i T of each bin is greater than /i r of the highest 
extinction bin. Each of the lower extinction bins had 
significantly different group mean [i r estimates than that 
of the highest extinction bin. 
For an assumed peculiar velocity dispersion cr pcc = 

300 km s _1 , the results were consistent. For the low- 
est extinction bin, the non-gaussian marginal probability 
density of /i,? 1 had a peak at 2.5, with mean 2.7 and a 
68% interval [2,3]. The posterior mean and standard de- 
viation of in the highest extinction bin was 1.5±0.1. 
The inferences for the alternate step function model con- 
firm the trend seen in the m = 1 linear correlation model. 
We discuss the implications of these findings in SjU 

5.3.3. Other Dust Population Models 

Posterior inferences of the hyperparameters using the 
other models for the host galaxy dust population ( §2.6|) 
are listed in Table [TJ In the case of complete pool- 
ing (CP), in which it is assumed that all SN have the 
same value of Ry, the marginal estimate of that value 
is Ry — 1.6 ± 0.1. In the population model m = 0, 
in which each ry is drawn from a Gaussian with mean 
independent of Ay, we find a population mean with a 
similar value (/ijT 1 = 1.7 ± 0.1). These results indicate 
that the highly extinguished SN dominate the estimate 
of the global constant or population mean in these cases, 
since their individual Ry estimates are the most precise. 
The CP and m = are special cases of the m = 1 model. 
If the m = model were favored then when fitting the 
m = 1 model we should have found /3i « 0. If CP were 
favored then we would have also found that of = 0. We 
inferred none of those in the expanded m = 1 model; this 
illustrates the pitfalls of those simpler assumptions. 

6. MODEL CHECKS 

After fitting the hierarchical model by computing the 
global posterior density, Eq. [121 using our BayeSN code, 
we checked the model fit using several methods. We 
did this to ensure first that the MCMC code was fit- 
ting the assumed statistical model to the data set, and 
to diagnose technical or algorithmic errors. Secondly, we 
checked the fit of the hierarchical model to the observed 
sample to look for disagreements between the assump- 
tions and the observed data. Third, we tested the ro- 
bustness of the model to the training set and evaluated 




Apparent Color X 

Fig. 11. — Comparison of the cumulative distribution of peak 
apparent B — V colors to those of posterior predictive replication 
sets randomly generated from the trained model, (top) When the 
hierarchical model is trained on our full sample, the distribution of 
apparent colors of the observed sample has a thicker tail towards 
the extreme red (B — V > 0.8 — 1 mag) than most of the the 
replicated sets, (bottom) When the hierarchical model is retrained 
on a sample restricted to apparent B — V < 1, the apparent color 
distribution agrees well with replicated SN sets. This suggests 
that the exponential model population distribution for extinction 
inadequately accounts for the number of SN observed at very high 
reddening. 

distance prediction error by performing extensive cross- 
validation ( £I7.2|) . For individual SN, we inspected the fits 
of the light curve model to the photometric data (< i5.ip . 

To check the fit of the model population distributions 
to the apparent distributions of the data set, we per- 
formed posterior predicti ve model checks (jRubinl Il984t 
iGelman et all 119961 [2001. Fr om the trained hierarchi- 
cal model, we generated a new random set of apparent 
light curves of the same size as the observed sample. 
The replicated set was generated by sampling forward 
through the directed acyclic graph, Fig. [TJ The distribu- 
tion of apparent properties of the replicated light curves 
was compared to those of the observed set. We illustrate 
such a comparison of peak apparent B — V optical colors 
in Figure ITT1 We generated 1000 replications, each con- 
taining the same number of SN as the observed sample. 
The apparent colors of the SN within each replicated set 
have a cumulative distribution function. The distribu- 
tion of apparent B — V colors is the convolution of the 
intrinsic B — V color distribution and the dust E(B — V) 
color excess distribution implied by the extinction dis- 
tribution. The set of replications form an ensemble of 
color distributions, reflecting random sampling variation 
and posterior uncertainty in the model. For each value 
of the B—V color we show the median, 2.5% and 97.5% 
quantiles of the ensemble of CDFs at that value. The 
black curve is the CDF of the apparent colors of the ob- 
served data set. If the observed CDF lies outside the 95% 
range of the replications, then the observed distribution 
disagrees with the model's replications. 

In the top panel of Fig. [TTJ the observed distribution 
of peak apparent B — V colors has a significantly thicker 
tail towards redder (positive) colors than the replicated 
distributions. This suggests that the number of very red 
SN with B — V > 1 is large compared to what can be 
expected with the exponential model for the dust pop- 
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ulation. The abundance of very red SN in the nearby 
sample might be a consequence of preferred selection of 
these events for follow-up observation. 

To test whether the model distribution adequately de- 
scribes the SN with less reddening , we removed the 4% 
of SN with apparent B — V > 1 from the training set, 
retrained the whole hierarchical model, and again gen- 
erated posterior predictive replciations (bottom panel, 
Fig. [TTj) . There is good agreement between apparent 
color distributions of the ensemble of replications and 
the observed data set. With the color cut, the estimated 
exponential scale of the extinction distribution decreased 
from t a = 0.37 ± 0.04 mag to r A = 0.28 ± 0.04, so that 
the model captures a dust extinction distribution with 
a thinner tail, which implies a narrower apparent color 
distribution. We found that (3 was consistent within the 
uncertainties with the values found by using the whole 
sample. This demonstrates that the trend is not deter- 
mined just by the reddest outliers of the SN sample. 

A key assumption of the model is that the two popu- 
lations, the SN la light curves and the dust extinction, 
are statistically independent. This entails that an in- 
trinsically faint or red supernova has the same chance of 
encountering a particular level of host galaxy dust ex- 
tinction as an intrinsically bright or blue supernova. We 
expect that the amount of extinction to SN should be un- 
correlated with the intrinsic properties of the SN la light 
curves. A significantly non-zero relationship between the 
two might indicate a miscalibration of the model, possi- 
bly related to a confusion between intrinsic color varia- 
tion and dust extinction. We tested this hypothesis, as 
shown in Figure 1121 where we plot the fitted intrinsic 
Ami5(f?) decline rates and the inferred intrinsic B — I 
color of SN la light curves versus the inferred dust ex- 
tinction. The plots shows the expected lack of correlation 
between the parameters from the two populations, and 
is a consistency check on the model fit. 

7. DISTANCE PREDICTION 
7.1. Hubble Residuals under Resubstitution 
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Fig. 12. — (top) The fitted intrinsic Amis(B) decline rate of SN 
la light curves versus inferred dust extinction. There is no sig- 
nificant trend between Amis(B) and Ay. (bottom) The inferred 
intrinsic B — I color at peak versus the inferred dust extinction. 
There is no significant trend between peak intrinsic B — I and Ay. 



After training the model on all the SN in the sample 
(D,Z), the distance modulus for each SN can be esti- 
mated by re-substituting each light curve set into the 
model, and computing the posterior predictive density 
P(fj>s\ T^s, z s ; T>, Z), which marginalizes over the uncer- 
tainty in the trained model. The expected value of this 
density is jUL su u- The Hubble residual is the difference 
between the resubstitution distance modulus and the dis- 
tance modulus expected from the redshift and the Hubble 
law, f(z s ) = ~E(/j, s \z s ). The estimates M(/j, s \z s ), a^^ and 
Mrcsub are listed in Table [5] The uncertainty-weighted 
mean square resubstitution error, e rr^„„ K is computed a s 
a sum over all SN, using Eq. 31 of MandeLeFaLl d2009h . 
For the m — 1 dust model, the error- weighted rms of the 
Hubble residuals at cz > 3000 km s _1 is 0.13 mag for 
the full sample. However, for the SN with NIR data, the 
resubstitution error at cz > 3000 km s _1 is 0.10 mag, 
and those with only optical data have a resubstitution 
error of 0.14 mag. 

7.2. Cross-Validation & Prediction Error 

For finite samples, the rms Hubble diagram residual 
of the training set SN is an optimistic estimate of the 
ability of the statistical model to make accurate distance 
predictions given the supernova observables. This is be- 
cause it uses the supernova data twice: first for esti- 
mating the model parameters (training), and second for 
evaluating the residual error. To evaluate predictive per- 
formance and guard against over-fitting with a statisti- 
cal model based on finite data, we should estimate the 
prediction error for SN not included in the training set 
("out-of-sample"). We use cross-validation (CV) to eval- 
uate the utility of optical and NIR light curves for accu- 
rately predicting distances in the Hubble diagram, and 
to test the sensitivity of the model to the finite training 
set. The importance of cross- validating statistical mod- 
els for predict i ng SN la dist ances has been discussed by 
IMandel et all (|2009t l and iBlondin. Mandel. fc Kirshnerl 
(|2011D . 

To estimate the distance prediction error of our sta- 
tistical model, we have performed bootstrap cross- 
validation. This method was first used for assessing 
distance pred i ctions of SN la light curve models by 
IMandel et all 1)20091 ). From the full set of SN, a new 
bootstrapped training set is created by sampling with re- 
placement individual SN up to the same size as the origi- 
nal set. The complement of this new training set forms a 
validation or prediction set. The training set light curves 
and redshifts are used to build the statistical model for 
SN la light curves, with the model hyperparameters esti- 
mated using hierarchical Bayesian inference and MCMC. 
The prediction set light curves are used to generate dis- 
tance predictions for those SN, which are then compared 
to the Hubble distances expected from their redshifts. 

We randomly bootstrapped 30 training sets, so that 
on average each SN was held out for distance predic- 
tion 11 times. For each SN s, the expected value of 
the posterior predictive probability density, /Xp rcd B = 

E(/2 S |X> S , z s ;T> B , Z B ), is a point estimate of the distance 
modulus prediction under the training set data V B , Z B 
for training set B. The .632 b ootstrap estimate (Efron 
[l98llEfron fc TibshiranilH997l) of rms prediction error is 
computed using the sum of uncertainty- weighted squared 
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Fig. 13. — Cross- validated Hubble diagram computed with 
BayeSN for the low-z nearby set of CfA and literature SN. Red 
points indicate the SN with joint optical BVRI and near infrared 
JH data. Black points have only optical data. The dashed (dot- 
ted) line indicates the magnitude uncertainty in fJ.(z) for cr pcc = 150 
(300) km s _1 . We perform cross-validation with 30 bootstrapped 
training sets to estimate the out-of-sample prediction error and 
test the sensitivity of the model predictions to the finite sample. 
The rms prediction error in distance modulus for SN with optical 
light curve data only at cz > 3000 km s" 1 is 0.16 mag. The SN 
with optical and near infrared light curve data have an rms pre- 
diction error at cz > 3000 km s _1 of 0.11 mag. The maximum 
likelihood estimate of the rms intrinsic prediction error is shown 
in parentheses, assuming a pcc = 150 km s —1 . For a velocity dis- 
persion crpcc = 300 km s —1 , the total rms prediction error remains 
the same, but the scatter attributed to intrinsic prediction error is 
0.129 ± 0.016 mag for SN with optical data only, or 0.081 ± 0.026 
mag for SN with optical and NIR, light curve data. These results 
indicate that one can make more accurate distance predictions with 
SN la with combined optical and NIR data than with optical data 
alone. 

prediction errors over all b ootstrapped sets, as described 
by Equations 32 and 33 of iMandel et all (|2009f ). For the 
m = 1 dust model, we list in Table [5] we list the pre- 
dicted distance modulus for each SN, averaged over all 
the training sets B that do not include that SN, as /2 pre d- 
The standard deviation of predictive uncertainty, i.e. the 
square root of Var[/i s | T> s ,z s ; T) B ,Z B ], averaged over the 
training sets B not containing that SN, is <7 prod . This 
measures the precision with which the trained model 
makes a distance prediction for a particular SN. The 
standard deviation of the point estimates /ip red B over all 
of those training sets, s prc d, is a measure of the sensitiv- 
ity of the predicted distances to resampling the training 
set. 

Figure [T31 shows the predicted distances to the SN us- 
ing bootstrap cross-validation. For Hubble flow SN at 
cz > 3000 km s _1 , the cross- validated prediction error is 
0.15 mag overall. For the SN with optical and NIR data, 
the prediction error is estimated to be 0.11 mag, and for 
SN with optical light curves alone, the rms prediction 
error is 0.16 mag. The predicted distances to SN with 
optical and NIR light curve measurements have a smaller 
scatter in the Hubble diagram than those with only op- 
tical data. These estimates of Hubble diagram scatter 
can be compared to the 0.18-0.22 mag rms found for the 
CfA3 sample using t he MLCS2k2 and SALT2 methods 
(|Hicken et al.ll2009bD . 

The weighted rms prediction error measures the to- 



tal Hubble diagram scatter, comprised of at least two 
components: a dispersion associated with unknown and 
random peculiar velocities with respect to the Hubble ex- 
pansion, and an intrinsic variance that represents a floor 
to the precision of distance predictions. We compute 
this intrinsic component of the prediction error using the 
m aximum likelihood estimator describ ed in Appendix B 
of iBlondin. Mandel. fc Kirshnerl (|2011[) . Assuming a ve- 
locity dispersion er pcc = 150 km s _1 , the rms intrinsic 
prediction error was 0.15 ± 0.01 mag for SN with optical 
data only, and 0.10±0.02 for SN with optical and near in- 
frared light curves. For er pec = 300 km s _1 , the weighted 
rms prediction error remains the same, but the scatter 
attributed to intrinsic prediction error is 0.13 ±0.02 mag 
for SN with optical data only, or 0.08 ± 0.03 mag for SN 
with optical and NIR light curve data. These estimates 
of intrinsic prediction error are smaller when a larger cr pec 
is assumed because more of the Hubble diagram scatter 
is attributed to random galaxy motions. 

The predictive variance <7p rcd measures the uncertainty 
with which the hierarchical model predicts the distance 
modulus of each individual SN, after marginalizing over 
the uncertainties in the training set, SN la and dust pop- 
ulations. Figure [14] shows the distributions of predictive 
uncertainties (standard deviations) in the individual SN 
distance moduli predicted from this model. The cumula- 
tive distribution of the predictive posterior standard de- 
viations for SN with optical light curve data only is com- 
pared to that of SN with optical and NIR data. For SN 
la with joint optical and NIR data, the predictive uncer- 
tainties are typically between 0.10 and 0.12 mag, whereas 
for SN la with optical data only, the uncertainties mostly 
lie between 0.12 and 0.16 mag. A simple Kolmogorov- 
Smirnov test verifies that these precision distributions 
are inconsistent. This demonstrates that the hierarchi- 
cal model estimates the distances to SN la with optical 
and NIR light curves with smaller uncertainty than those 
of SN la with only optical data. 

The sample variance of the distance predictions for a 
single SN over bootstrapped training sets, Sp rcd in Table 
[51 is always much smaller than the uncertainty variance 
(Tp red of a single prediction, and is smaller than the mean 
square error over the set of SN in the Hubble diagram. 
The typical value of s prcd over the set of SN is ~ 0.03 
mag. This demonstrates that our model's distance pre- 
dictions to individual SN are fairly robust to perturbing 
the composition of the training set; the sensitivity of pre- 
dictions to resampling is of order a few hundredths of a 
magnitude. With a larger set of optical and NIR light 
curves, this sensitivity could be reduced further. 

We examined the cross-validation prediction errors to 
check for systematic trends against observable or inferred 
quantities, as possible signs of model misfit. In Figure [TBI 
we show a scatter plot of the prediction error for each SN 
versus an observable or inferred quantity. We find no sig- 
nificant trends of prediction error versus predicted dust 
extinction Ay, the apparent optical colors at peak (e.g. 
B — V), apparent optical- near infrared colors at peak 
(e.g. V — H), or optical light curve shape, summarized 
by the canonical Amis(-B). Linear regressions fit to the 
prediction errors versus each quantity yield both slopes 
and intercepts that are statistically consistent with zero. 
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Fig. 14. — Cumulative distributions of the marginal predictive 
uncertainties of distance moduli of SN la. The uncertainty in the 
predicted distance modulus is represented by a probability density 
after integrating over the other uncertainties in the dust estimates, 
light curve fits and iC-corrections, and the population. We show 
the CDFs of the standard deviations of the predictive probability 
distributions of the individual SN distance moduli. The SN with 
optical and NIR light curve measurements (red) typically have a 
smaller distance uncertainties (higher precision) than those with 
only optical light curve data (blue). The dashed lines represent 
95% confidence intervals of the respective CDFs. The two distribu- 
tions are highly discrepant according to the Kolmogorov-Smirnov 
test. Using combined optical and NIR light curve data, the hier- 
archical model makes distance predictions with smaller estimated 
uncertainty and higher precision than it does with optical data 
alone. 

7.3. Distance Error Comparison with CSP Light Curves 

We augmented our SN sample with 27 nearby SN 
recently published by the Carnegie Supernova Project 
(|Contreras et al-lfeOlOD . and again performed the cross- 
validations to produce predictions for each SN. There 
were 10 SN that were contained both in the CSP sample 
and the CfA3+PAIRITEL sample. To avoid including 
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Fig. 15. — Cross- validated distance prediction errors fJ. pTe d — fJ-(z) 
versus observed and inferred SN quantities of interest. The distance 
modulus prediction errors of the model, averaged over 20 boot- 
strapped training sets, do not show statistically significant trends 
with respect to host galaxy dust extinction Ay, apparent optical 
color B — V , apparent optical-near infrared color V — H or optical 
decline rate Amis(i?). Fitted regressions have slopes consistent 
with zero (blue). 
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Fig. 16. — Cross-validated Hubble Diagram computed with 
BayeSN for the low-z nearby training set using CfA, CSP and liter- 
ature SN. Red points indicate the SN with joint optical BVRI and 
near infrared JH data. Black points are SN with only optical data. 
The dashed (dotted) line indicates the magnitude uncertainty in 
^1(2) for (Tp ec = 150 (300) km s _1 . We perform bootstrap cross- 
validation to estimate the out-of-sample prediction error and test 
the sensitivity of the model predictions to the finite s amp le. The 
rms prediction errors are consistent with those of Fig. 1131 

duplicate light curves for the same SN in the joint sam- 
ple, we selected the CSP light curves in those cases, since 
this resulted in the retention of the most optical and NIR 
data. We recomputed training and prediction under the 
m = 1 dust population model. The Hubble diagram of 
these distance predictions is shown in Figure 1161 The 
results are consistent with the previous Hubble diagram: 
the total rms dispersion at cz > 3000 km s _1 was 0.15 
for SN with optical data only, and 0.11 for SN with op- 
tical and near infrared light curves. We compared the 
distribution of distance modulus errors /iprod — f(z) for 
Hubble flow SN with optical and NIR data from the CSP 
sample with that of the "CfA+literature" sample. Using 
a two-sample Kolmogorov-Smirnov test we cannot rule 
out that they are from the same distribution (p = 0.88). 
The distance predictions of each set are statistically con- 
sistent, so it is reasonable to analyze the combined set. 

7.4. Cross-validation with Different Ry Assumptions 

In this section, we investigate the effect of different 
model assumptions about the dust population on dis- 
tance predictions. For each case of $2.61 we computed 
cross- validated distance predictions for SN in the Hubble 
flow at cz > 3000 km s" 1 by generating 20 bootstrapped 
sets for training and predicting the distance moduli for 
the complementary validation set. For these computa- 
tions, we used the "CfA+CSP+literature" sample of 127 
SN la. Table [1] displays the results of these calculations, 
including the marginal posterior estimates of the hyper- 
parameters in each case, and the 0.632 estimate of total 
prediction error for the SN with optical light curves only, 
and for the SN with both optical and near infrared data. 

The case with fixed Ry = 3.1 (the Milky Way inter- 
stellar average) for all SN leads to the worst distance 
predictions (0.20 mag for optical, 0.13 mag for optical 
and near infrared). The cases of complete pooling (all 
SN have Rv with the same value) or partial pooling with 
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m = (the Rys come from a population independent of 
the Ay value), or even no pooling (each Ry estimated 
independently for each SN), rms prediction errors are 
about 0.15 mag to 0.16 mag for optical light curves only, 
and 0.12-0.13 mag for optical plus NIR light curves. 

If we model a potential population correlation between 
Ay and RZ , using either linear or step function models, 
we find the smallest cross-validated distance prediction 
errors, both for SN with optical data only (0.15 mag), 
and for the SN with optical and NIR light curves (0.11 
mag). These are both significant improvements over the 
rms prediction errors under the assumption that Ry = 
3.1 has the mean value for Milky Way interstellar dust, 
and are also marginally better than those of the other 
cases. However, the statistical sampling uncertainty is 
about ±0.01 mag, so it is difficult to draw significant 
distinctions between the rms cross-validated prediction 
errors of the latter five cases. An analysis of a larger, 
future sample of optical and NIR light curves will help 
to further discriminate between these competing cases. 

It is notable that the change in the rms distance modu- 
lus prediction error for SN with optical light curves alone 
is 0.05 mag between the worst case and best case dust 
population models, whereas this change for SN with opti- 
cal and NIR light curves is only 0.02 mag. This highlights 
the advantage of including the NIR data; since the H- 
band provides a good standard candle, the model can rely 
mostly on the NIR light curves to provide distance esti- 
mates that are both less vulnerable to host galaxy dust, 
and less sensitive to the assumptions about the dust. 

7.5. Improving Constraints on Dust and Distance with 
Optical and NIR Data 

In this section, we demonstrate the effect of using NIR 
light curve observations in conjunction with optical data 
for constraining extinction and for making more precise 
predictions. In Figure [17] we show the posterior pre- 
dictive densities for the distance modulus and the joint 
probability densities for SN 2002bo, an event with high 
extinction. With the trained probability model, we com- 
puted the joint probability P(fi, Ay \ T> s , z s ; T>, Z) under 
prediction where the light curve data T> s alternately in- 
cluded the SN 2002bo observations in the BV, BVRI, 
or BVRIJH filters. Recall that, under prediction, the 
tildcd redshift z s is only used for i-T-corrections and Milky 
Way extinction, but not in the redshift-distance likeli- 
hood function. The dataset used for training is denoted 
T>, Z. We also compute the marginal posterior predictive 
probability P(/2| T> s , z s \ T>, Z) for each case. The proba- 
bility density in (jl, Ay) integrates over the uncertainties 
in the SN light curve fit, and the dust and SN popula- 
tions. The marginal density in jl additionally integrates 
over the uncertainty in Ay, as a "nuisance" parameter. 
These marginal probability densities were computed di- 
rectly from the MCMC samples, obtained under predic- 
tion, using kernel density estimation. They are not Gaus- 
sian approximations of the posterior probability density. 

For comparison we mark the expected distance modu- 
lus for the observed redshift, and its expected magnitude 
uncertainty for a 300 km s _1 velocity dispersion. The 
marginal probability density in fi integrates to one, so 
that the taller pdfs make the most precise distance pre- 
dictions, and the shorter pdfs make the most uncertain 




Fig. 17. — The effect of adding NIR light curve data on sta- 
tistical uncertainties on distance modulus fi and extinction Ay 
for SN 2002bo. (left) The predictive probability density of /i us- 
ing BV light curve data only (blue), BVRI data only (green), 
and BVRIJH data (red), computed from the trained optical- 
near infrared statistical model. The pdfs integrate to one, so that 
more precise predictions are taller, and less precise predictions are 
broader. The black vertical lines indicate the expected value of fi 
given the redshift and the associated error of ±300 km/s peculiar 
velocity dispersion, (right) The predictive joint probability density 
of (fj,, Ay) using optical or optical and near infrared data. The two- 
dimensional modes are marked, and the inner and outer contours 
contain 68% and 95% of the highest probability regions. Whereas 
with BV data only, the distance is uncertain due to the uncertain 
extinction by host galaxy dust, with BVRIJH data, the uncertain- 
ties in extinction, and thus in distance, are reduced significantly. 

predictions. The figure demonstrates that adding obser- 
vations in the redward filters greatly improves the pre- 
cision of distance predictions, with the full optical and 
NIR data set yielding the greatest precision with this 
model. We also show the mode and 68% and 95% high- 
est probability density regions of the joint probability 
P{jj,,Ay\V s ,~z s ;V,Z). When only the blue BV data is 
used, there is a strong degeneracy between the uncer- 
tainty in distance modulus and the uncertainty in host 
galaxy dust extinction. Both the uncertainty in the dis- 
tance modulus and in dust extinction are reduced when 
we condition on the available NIR light curve observa- 
tions. The predictive precision (the inverse variance) of 
the distance modulus of an individual SN is, on average, 
improved by a factor of 2.2 using BVRI and by a fac- 
tor of 3.6 using BVRIJH data, compared to using BV 
light curves alone. The predictive precision is improved 
by 60%, on average, using optical and NIR BVRIJH 
data versus optical BVRI data alone, and can be im- 
proved by up to a factor of 2.6, based on the current 
sample. 

In Figure [HI we illustrate these inferences for a differ- 
ent event, SN 2005el. This supernova appears to have 
near zero host galaxy extinction. However, even with 
near zero dust extinction there is uncertainty in Ay due 
to the intrinsic variance of supernova colors. This un- 
certainty is in the direction of positive extinction, and 
hence the joint distribution P(Ji, Ay\T> s ,z s ;T> ,Z) ap- 
pears non-Gaussian. Although this event has close to 
zero extinction, there is still a strong degeneracy in the 
uncertainties between /z and Ay under prediction with 
the BV data alone. The combination of optical and near 
infrared data constrains this joint uncertainty and yields 
improved precision of distance predictions even for low 
extinction events. 

8. DISCUSSION & CONCLUSION 

We have constructed a comprehensive hierarchical 
model for Type la SN light curves in the optical and 
near infrared (BVRIJH). We model the apparent light 
curves as the sum of random draws from an absolute light 



22 



curve population distribution and from a host galaxy 
population distribution, plus the distance moduli. While 
fitting the individual SN la light curves, we also estimate 
the characteristics of the two populations. These include 
the intrinsic correlation structure of the absolute light 
curves, and the joint distribution of extinction Ay and 
the slope of the dust law Ry in SN la host galaxies. The 
application of our new BayeSN MCMC algorithm en- 
ables coherent probabilistic inference of the unknown pa- 
rameters and hyperparameters given the observed data. 
We also use it to generate distance predictions for SN 
while marginalizing over the uncertainties in the popula- 
tion models and training set. 

The inferred correlation matrices of the intrinsic light 
curve properties ( ^5.2.21) show that the peak optical ab- 
solute magnitudes (BVRI) are strongly correlated with 
each other, but have weaker correlation with the J and 
H near infrared absolute magnitudes. Similarly, while 
the peak optical absolute magnitudes are correlated with 
optical decline rates (particularly Ami5(B)), they have 
low correlation with the NIR decline rates. The near in- 
frared absolute magnitudes exhibit low correlation with 
the optical decline rates. This indicates that the NIR 
light curves provide independent information on the lu- 
minosities of SN la, which can be leveraged to improve 
the precision of distance estimates. 

We inferred the distribution of host galaxy extinction 
Ay, with an average value of ta = 0.37 ± 0.04 for our 
nearby sample. However, the exponential dust distri- 
bution does not adequately fit the fat tail of the peak 
apparent color distribution: excluding the 4% of SN in 
the extreme red tail, the apparent colors and dust ex- 
tinctions of the other 96% of SN are well described by 
an exponential distribution in Ay with ta — 0.28 ± 0.04 
and an intrinsic color distribution. Using both linear 
and step function models ( §2.6[) . we modeled and in- 
ferred the joint distribution of Ay and the extinction law 
slope parameter, Ry, and found strong evidence for an 
apparent correlation. Under the assumption of a linear 
trend between Ry 1 and Ay, we found a positive slope. 
In the limit of low extinction, the marginal estimate of 
Ry ~ 2.8 ± 0.5 is consistent with the Milky Way inter- 
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Fig. 18. — The effect of adding NIR light curve data on statisti- 
cal uncertainties on distance modulus /i and extinction Ay for SN 
2005el. (left) The predictive probability density of fi using BV light 
curve data only (blue), BVRI data only (green), and BVRIJH 
data (red), computed from the trained optical- near infrared statis- 
tical model. The pdfs integrate to one, so that more precise predic- 
tions are taller, and less precise predictions are broader. The black 
vertical lines indicate the expectation value of fi given the redshift 
and the associated error of ±300 km/s peculiar velocity dispersion, 
(right) The predictive joint probability density of (fi, Ay) using op- 
tical or optical and near infrared data. The two-dimensional modes 
are marked, and the inner and outer contours contain 68% and 95% 
of the highest probability regions. Although this SN most likely has 
little extinction, the addition of the NIR data still helps to improve 
the constraints on Ay and the precision of distance predictions. 



stellar average, Ry = 3.1, and with i ndependent mea- 
surements o f dust in external galaxies (|Finkelman et al.1 
120081 120101 R v = 2.8). For SN with very high extinc- 
tion Ay > 1, values of Ry « 1.7 are favored. However, 
we do not know if the linear assumption is valid over the 
whole range of Ay, so we have explored alternative mod- 
els for the potential differential behavior of Ry. Under 
the assumption of a "step" function model that groups 
together SN in four bins in Ay, the characteristic Ry in 
the lowest extinction bin (Ay < 0.4) had a modal value 
of 2.5 and mean values 2.7-2.9 and for the highest extinc- 
tion events (Ay > 1.25), we found Ry = 1.6 ± 0.1. We 
find that these differences are statistically significant. 

These results suggests that SN at low extinction are 
seen through lines of sight with "normal" interstellar 
dust, but SN at high extinctions are seen through dust 
with a steeper reddening law. This may indicate a cir- 
cumstellar dust component dominat ing the abso rption 
of light to hi gh extinction events. iWand (|2005[ ) and 
iGoobarl (|2008l ) suggested that scattering of SN light by 
circums t ellar d ust clouds could lead to low values of Ry . 
IGoobarl (|2008f ) calculated that multiple scattering of SN 
light by dust in the locality of the SN would attenuate 
short wavelength p hotons and steepe n the extinction law 
to R v ~ 1.5-2.5. iPatat et all (|2007ft reported the detec- 
tion of spectroscopic signatures of circumstellar material 
around SN 2006X, a highly extinguished SN la. The ef- 
fects of circumstellar dust might provide an explanation 
for the unusual colors of some high extinction events. 

With consideration to the uncertainties of our infer- 
ences and model assumptions, a conservative conclusion 
is that most SN in our sample are affected by host galaxy 
dust with Ry in the range of 2 to 3. These SN are extin- 
guished by Ay < 1 mag. At higher extinctions, Ay > 1, 
the SN are obscured by dust with Ry in the range of 1.5 
to 2, although it is also possible that those SN have differ- 
ent intrinsic colors than the general population. Notably, 
we do not find Ry values lower than 1.5. This is at odds 
with the Ry values between 1 and 2, man y of which were 
below 1.5, fit for their whole sample by iFolatelli et all 
(2010) by minimizing the scatter in the Hubble diagram 
of CSP supernovae. It is also at variance with similarly 
low Ry values found in the recent literature. However, 
those analyses assume that a single Ry value applies to 
all the SN for a given fit, while we have allowed for dis- 
tributions in Ry 1 that may be dependent on Ay. If we 
assume that every SN in our sample has exactly the same 
Ry (complete pooling), or that the distribution of Ry 1 
has a single mean independent of Ay (m = 0), we also 
find Ry — 1.6 — 1.7. This suggests that these assump- 
tions lead to estimates of Ry that are biased towards 
smaller values. This is not surprising, since Ry is best 
determined for SN with high Ay, and as these SN also 
have apparently low Ry, they dominate the estimate of 
a global constant or average. 

Differences are also likely to arise from the treatment 
of the intrinsic covariance structure of the SN la light 
curves. In this paper, we have modeled the intrinsic 
covariances between the absolute light curves in optical 
and NIR wavelengths spanning —12 to 45 days in phase, 
and estimated them by probabilistically de-convolving 
the apparent distributions using a hierarchical model. 
Posterior estimates of Ay , Ry for each SN and all other 
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parameters were obtained via Eq. 1191 and by marginaliz- 
ing over uncertainties. In particular, as estimates of Ry 
for individual SN at low extinction are difficult to de- 
termine exactly, it is necessary to marginalize over that 
uncertainty when making inferences and predictions, as 
we do as part of the Bayesian inference. The inclusion of 
the NIR light curve data yields an added benefit. Since 
the peak H-hand absolute magnitude is a good standard 
candle by itself, it is used to predict precise distances 
that are less susceptible to error from the dust estimate 
or the dust model. For example, for a single SN with 
average extinction, Ay w 0.3 mag, the change in Ah 
between Ry = 1.7 and Ry = 3.1 is about 0.02 mag. 

The linear and step function models of the joint distri- 
bution of {Ay, ry) both suggest that the average effective 
Ry at a given level of Ay decreases gradually with the 
increasing host galaxy extinction. We might speculate 
on the existence of two kinds of host galaxy dust with 
two different reddening laws over wavelength. One would 
correspond to "normal" interstellar dust as found in the 
Milky Way, Ry « 3, and the other would correspond 
to some kind of circumstellar dust with a reddening law 
with Ry w 1.7. If the dust affecting each SN is comprised 
of random amounts of these two types of dust, then the 
effective ry would roughly be an extinction- weighted av- 
erage of the characteristic rys of their respective redden- 
ing profiles. If the circumstellar component was associ- 
ated with highly dusty environments, then this mixture 
could generate an apparent trend of effective ry against 
total extinction. This suggests an extension of our hier- 
archical model, which we will address in a future work. 

Using bootstrap cross-validation, we have randomized 
the optical and near infrared training set to generate 
probabilistic estimates of the distance moduli to out-of- 
sample SN. Comparing these to the distances expected 
from the Hubble expansion, we found a total rms pre- 
diction error of 0.16 mag (at cz > 3000 km s _1 ) for SN 
with optical light curves (BVRI) only, but a total rms 
error of 0.11 mag for SN with optical and near infrared 
(BVRI J H) light curves. After accounting for the dis- 
persion expected from random peculiar velocities with 
°pcc = (150,300) km s _1 the rms intrinsic prediction er- 



rors for these subsets were (0.15 ± 0.01, 0.13 ± 0.02) mag 
for optical and (0.10 ± 0.02, 0.08 ± 0.03) mag for optical 
and NIR. This demonstrates that distances to SN la ob- 
served in the optical and near infrared can be estimated 
with about twice the accuracy (~ [0.15/0.10] 2 ) of SN la 
observed in the optical alone. By conditioning on light 
curve data subsets (BV, BVRI, BVRIJH) for individ- 
ual SN, we show that including near infrared light curve 
data tightens the constraints on host galaxy extinction 
and distance predictions (i jT. 5[) . 

The number of published optical and near infrared 
light curves of SN la is still small compared to the sam- 
ple of optically observed events. Future, larger samples of 
SN la with accurate, joint optical and near-infrared pho- 
tometry will help test and build the statistical strength 
of our conclusions on the utility of combining optical and 
NIR light curves for improving distance predictions and 
will help illuminate the nature of the dust in SN la host 
galaxies. In addition to estimating the intrinsic correla- 
tion structure of SN la light curves and the distribution 
of host galaxy dust, our hierarchical framework can be 
applied to distance prediction and analysis of a cosmo- 
logical sample of SN la. Cosmological samples of SN 
la observed in the rest-frame NIR are possible. The im- 
proved precision and accuracy of the inferences about the 
history of cosmic expansion may justify the extra effort 
required to obtain these data now with the Hubble Space 
Telescope, soon with the James Webb Space Telescope, 
and eventually with the WFIRST mission. 
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APPENDIX 

A. DIFFERENTIAL DECLINE RATES LIGHT CURVE MODEL 

The continuous normalized light curve is equivalent to the specification of the total decline rates of the apparent 
light curve. Let D F (t) be the total decline rate from phase zero to phase t: D F (t) = LC F (i) — Fq = l F (t). For example 
the well-known i?-band decline rate from peak to 15d past maximum is Am 15 (5) = D B (15) flPhillipslll993T) . 

For practical purposes, it is necessary to parameterize the continuous model for the light curve with a discrete set 
of variables. Let r be a grid in rest-frame phase. The total decline rates to each discrete grid point, D F — {D F = 

D F (rj)}, defines the normalized F-band light curve l F (t) at all times if we choose a suitable interpolation rule. If we 
choose a natural cubic spline we ensure continuity up to two derivatives, and the normalized light curve is then linear 
in the total decline rates: l F (t) = s(t, r) • D F and the linear smoother s(t, r) is specified. The differential decline rate 
is d F = D F — Dj^i- The total decline rates at the knots of r are sums of the differential decline rates over the span 

in phase. There is a simple constant matrix G so that D F = Gd F , where d F is the vector of differential decline rates 
d F . The model for the normalized light curve in band F is linear in the differential decline rates: l F (t) = s(t, t) ■ GdF . 

This Differential Decline Rates representation is a special case of Eq. [TJ with l F (t) = 0, l F (t) = s(t,r) ■ G, 
6 F = 6 F = d F , and 6 F ] L — 0. We use an irrregular grid t in phase spanning — lOd to 45d. The knots are placed more 
densely near phases where we expect the most observations (near phase zero) and where we expect more curvature of 
light curves in certain bands. 
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B. ^-CORRECTIONS AND MILKY WAY EXTINCTION 

The observed spectral energy distribution (SED) of a SN la changes relative to a fixed observer frame passband due 
to the effect of cosmological redshift and extinction in the Milky Way varying with line of sight. To account for these 
differences in flux, we derive if-corrections and Galactic extinctions for type la supernovae in observer-frame optical 
and NIR filters. As a high-quality SED time sequence is sel dom available for a ll supernovae, particularly at higher 
redshifts, we use the average spectral template sequence from lHsiao et al.l (|2007f) . 

We compute if-corrections for optical and NIR passbands following the method of iNugent. Kim, fc Perlmutterl 
(2002). We model the UBVRI filters with the "shifted" Bessell passbands (jKessler et al.ll2009D and the JHK* filters, 
with the 2MASS passbands. We choose a standard color that includes each passband (U:U — B, B:B — V, V:V — R, 
R:R — I, I:R — I, J:J — H, H:J — H, K K :H — K x ) and for each model filter, we warp the SED sequence using 
the Ry — 3.1 extinction law (jCardelli et al.l IT989f ) to take on a wide range of the corresponding rest-frame color, 
measured with synthetic photometry. For a given observer frame passband and redshift we determine the rest-frame 
passband with the nearest effective wavelength. The warped SED series is then redshifted and used to determine the 
i^-correction as a function of rest-frame color. This procedure does not account for spectral features that vary with 
lightcurve shape and there is no constraint on the SED blueward or redward of the bluest or reddest model filter. 

To compute the M i lky W ay Galactic extinction, we follow a procedure, modified from that outlined in 
Uha. Riess. fc Kirshnerl pOOl . For a given model filter, we use the warped SED sequence constructed for the K- 
corrections and determine the unextinguished observer-frame magnitude with synthetic photometry as a function of 
phase and rest- frame color. The sequence is then reddened with a Rv =3.1 law for a range of values of the Milky Way 
reddening £mw = E{B — V) and the extinguished observer frame magnitude is computed. The difference between 
the two magnitudes is the Galactic extinction. We find that the Galactic extinction for a given phase t and passband 
X is well modelled by a quadratic in Pmw: Ax(t) — [cex(t, c) + fix(t, c)£mw]£mw- We solve for the polynomial co- 
efficients ax and fix for all the phases t and rest-frame apparent color c. Further, as the rest-frame color dependence 
was introduced by warping the same spectral sequence, we find the co-efficients ax and ftx to be smoothly varying 
functions of rest-frame apparent color c. We model the slope and intercept by polynomials of rest-frame color with 
degree 4 and 5, respectively. We can thereby reduce the Galactic extinction to a si mple set of polynomi al coefficients 
of color tabulated with phase. The value of Pmw for each SN is obtained from the iSchlegel et all (|1998j ) maps. 

C. SPECIFICATION OF THE HYPERPRIOR 

There are two populations in this hierarchical model: the multi-band light curve distribution, and the host galaxy 
dust extinction distribution for Ay and Ry- The hyperparameters of the SN la light curves are population mean /x^ 
and the covariances The hyperparameters of the dust populations are ta, P and of (e.g. for Case 5 in §2.6|) . 
We must make explicit our priors on these hyperparameters, i.e. hyperpriors. At the highest level of the hierarchical 
model, we use diffuse, or "non-informative" prior distributions by default. 

The dust population hyperprior is P(ta, fl, of) = P(ta)P(P\ °f)P(of ). We adopt uniform prior P(/3|of) oc 1. For 
ta and of, we use the standard non-informative prior for positive scale parameters, P (log ta) oc 1, P(logof ) oc 1. 

The hyperprior on the absolute light curve distribution hyperparameters can be conditionally decomposed: 
P((ij,,Hji) — P(jU 1 /,|S^,)P(S^,). We assume a uniform P(fJ.^\ S^,) oc 1. For P(S^), we require a diffuse density that 
has support only on the space of symmetric, positive definite, and invertible matrices. We employ the standard inverse 
Wishart distribution, which is conjugate to the normal covariance matrix: P(£^) = Inv-Wishart„ (E,/,| Ao) X f(tr^). 
The inverse Wishart density is multiplied by a smooth density on the variances f(cr^) described below. The degrees 
of freedom parameter is set to vq = K + 1 where K = dim(-0). This guarantees that the marginal prior density of any 

individual correlation p(ipi,ipj) = R 1 ^ is uniform between —1 and 1 ()Barnard. McCulloch. fc Mendl200"oh . The scale 
matrix is set to Ao = eo I, where J is a K x K identity matrix. The scale eo has the effect of setting a floor on each 
so that it does not fall below the value of eo/VAsn ~ 0.02. Since we do not realistically expect any standard deviation 
a\ to be less than a few 0.01 mag, this limit is conservative, and helps to prevent the MCMC chain from getting stuck 
in a region of parameter space with a near-zero variance, where the covariance matrix may be nearly singular. 

We found it useful to stabilize the estimation of magnitude variances with a power-law density: log/(<r^) = 
log/[cr(Mg)] = pAsN log a (Mb) with p w 0.09. We chose the smallest value of p for which the inferences were 
locally insensitive to its value. We have checked that the cho ice of p does not significantly impact the rms pre diction 
error. We also used the scaled inverse Wishart distribution (|Gelman fc Hilll [20061 : 10 'Mallev fc Zas lavskv 2008) as an 
alternative hyperprior and obtained comparable results. 

D. MATHEMATICAL DETAILS: BAYESN 
In this section, we provide mathematical results for each step of the BayeSN algorithm. 

1. The goal is to sample from P(/x^,S^| -,T>,Z). This can be factored as P(n^\ S^,, {t/> s })P(S^,| {ip s }). These 
densities only depend on {tp s } through the sufficient statistics: the sample mean xp, and the matrix sum of squared 
deviations from the mean: S^, = Y^f^ii^s ~ '0)('0s — '0) T - We generate a new from the proposal density 
<j(£* | {tp s }) = Inv- Wishart^ (£* | A^ 1 ) , where v N = v + A SN , and Ajv = + A . When the (e /N SN ) 2 is 
negligible compared to the variances, the expectation of this distribution is just the standard maximum likelihood 
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estimator of covariance, S^/Nsn- If / (o^) ^ 1 then the proposal is the same as P(EJ| {0 S }), and this is Gibbs 
sampling. If not, then the Metropolis-Hastings ratio r simplifies to r = /(o"^)//(<r^,). The proposal is accepted 
(X^ — > E^) with probability r. This method results in fast convergence since it allows for updating the entire 
covariance matrix at once. A new fi^, is Gibbs sampled from P(^\ E,/,, {tf> s }) — jV(/*^| ijj, E^/JVsn)- 

2. The conditional density for t a is P(ta| T>, Z) = P(t a \ {A s v }) = Inv-Gamma^l N SN , E^i A v)- 

3. The conditional densities P(/3\ a 2 , }r v ,A^ }) and P(cr 2 | Wv, Ay }) are standard results of Bayesian analysis of 
ordinary linear regression of ry versus A v (|Gelman et al.ll20M Ch. 14). 

4. Since the subsequent steps concern only one SN at a time, we suppress the label s on individual SN parameters. 

(a) The conditional posterior density of the fit (To, 0) for a single SN is proportional to 

P(T ,(jy\-,V s ,z s ) (x P(m\T ,(jy,z s ) x iV(0| E^) (Dl) 

where (1$ = + A + vfi, and the first factor is Eq. [3] We construct a proposal density for the new fit 
(T *, cf>*) given the current one: q(T *, <j>*\ T , 0) = <?(0* | T *; T , 0) x q{T£\ T ; 0). The proposal for the new 
T * is g(T *| T ; 0) = iV(T *| T , sf,)- The proposal q(<p*\ T *; T , 0) is an approximation to Eq. |DT]with the 
iiT-correction and Milky Way extinction factors fixed at the current fit (To,0). These depend on only 
through the apparent colors. The proposal is 

q(<t>*\ T*; T , 0) oc N[m\ KC(T ; z, 0) + GX(T ; z, 0, E MW ) + L 2 (T* , z)0* , W] x TV (0*| ^, E^) (D2) 

After algebraic simplifications, it can be shown that this is a Gaussian probability density on 0* and thus 
can be used to generate a random proposal. The joint proposal (T *, 0*) is accepted with probability 

r = P(T o *,0*|-,I? s ,Zs) g (0|r o ;T o *,0*) 

P(T o ,0|-,2? s ,z s ) g(0*|T o *;T o ,0)' 1 J 

The rejection step corrects the approximation of the conditional, Eq. ID11 with the proposal Eq. ID21 If the 
KC and GX factors are constant with respect to SN color and phase, then r = 1 and this is just Gibbs 
sampling. This scheme is efficient when KC and GX are slowly varying with phase and apparent color. 

(b) The conditional density for fi simplifies to P(/x| •, V s , z s ) — N(fi\ fi, a 2 ), where fi — s 2 l v T H^ j 1 (4> — A — /i^,); 
s~ 2 = v T T,^ ! 1 v; a~ 2 = s~ 2 + a~ 2 ; and fi — d- 2 (a~ 2 f(z) + s~ 2 /x). For prediction, we take — > oo. 

(c) The conditional density for Ay is P(Ay \ -,T> S , z s ) — P(Ay\4>, zx, ry; fi^, S^, ta, (3, cr 2 ). This is a probability 
density on Ay > proportional to N(Ay\A, s 2 A ) x N(r v \f3 + /3iA v ,a 2 ), where A = s\c T 'E^ 1 [4> — vfj, - 
fj,^] — s\/ta] c= (a + (3ry), and s^ 2 = c r E7 c. This can be sampled using griddy Gibbs sampling. 

(d) The conditional posterior P(ry\-,V s ,z s ) = P(rv|0, /x, Ay] fj,^, S^, j3, a 2 ). Defining a^ 2 = A' v /3 T 'E^f3; 

/i r = a 2 (3 T Ay'£^ J 1 {(f> — vfi - Ay a — jit^]; &~ 2 = b~ 2 + a~ 2 ; fy = a 2 [a~ 2 p, r + a~ 2 ((3 + ft. Ay)], this density 

is proportional to N(ry\ ry,cr 2 ) over the restricted range 0.18 < ry < 0.7. A new sample is generated by 
evaluating the pdf on a fine grid and using griddy Gibbs sampling. 

(e) (optional) Generalized conditional sampling allows the MCMC to move along expected degeneracies between 
parameters in the posterior density that may be oblique wi th respect to the natural coordinate system defined 
by the chosen parameters (|Liu fc Sa batti 2000; Liu 2002). We expect there to be a trade-off between dust 
extinction and distance to SN, since both make SN appear dimmer. Let p(Ay,fi) = P(Ay,fi\ -,T> s ,z s ) be 
the conditional posterior of dust and distance. To perform the translation (Ay, (j,) —> (Ay, /i) +7(1, — x), we 
first choose a scalar x which sets a direction in the (Ay,fJ>) plane to move along. To select an appropriate 
direction along the trade-off between dust and distance, we find x = min x |(a + (3ry) — xv\ 2 . For typical 
values of ry, this was x « 0.7. To select a translation vector near this direction, we sample x ~ N(0.7, 0.05). 
Then we sample a random 7 ~ p(Ay +7, /x — xj), where Ay and /x are the current values. The sample can 
be generated by evaluating the univariate density on a grid and using the inverse cdf method. Given 7 the 
chain can be translated to the new position. 
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TABLE 4 

Apparent Light Curve and Dust Estimates for Individual SN Ia 



SN 



Ami 5 (B) 



Rq 



•A) 



Ho 



A v c 68%(A V ) 



Rv 



Ref. 



SN1998bu 

SN1999cl 

SN2005el 

SN2005eq 

SN2006ax 



12.11 ±0.01 
14.86 ±0.03 
14.85 ± 0.02 
16.26 ±0.04 
15.01 ±0.02 



1.03 ±0.02 
1.17 ±0.07 
1.28 ±0.05 
0.88 ± 0.05 
1.05 ±0.03 



11.80 ±0.01 
13.74 ±0.02 
14.91 ±0.02 
16.23 ±0.03 
15.08 ±0.02 



11.65 ±0.01 
13.27 ±0.03 
14.97 ± 0.03 
16.33 ±0.04 
15.18 ±0.02 



11.74 ±0.02 
12.96 ±0.04 
15.53 ±0.02 
17.01 ±0.02 
15.87 ±0.02 



11.87 ±0.03 
13.03 ± 0.04 
15.75 ± 0.03 
17.26 ±0.05 
16.29 ±0.04 



0.97 
1.95 
0.01 
0.25 
0.01 



0.85, 1.07 
1.86, 2.10' 
0.00, 0.11' 
0.16, 0.40' 
0.00, 0.12 



2.2 ±0.3 
1.6 ±0.1 
2.8 ±0.6 
2.6 ±0.5 
2.8 ±0.6 



J99,H00 
KOO 
WC3 
WC3 
WC3 



NOTE. — This table is a representative stub. The /-band estimate is omitted here to preserve width. 

a Apparent magnitude at maximum light in rest frame B filter, corrected for Milky Way extinction and /^-corrections. Estimates only listed if SN was 
observed in the filter. 

Apparent magnitude in rest frame V at time of maximum in B, corrected for Milky Way extinction and iC-corrections. 
c Marginal posterior mode of extinction Ay, 

Highest posterior density interval containing 68% of marginal probability. 
6 Marginal posterior mean and standard deviation. 

f Reference codes: Cf A3: iHicken et aD ^2009al); WV 08: iWood-Vasev et al.l ( PAIR ITEL; f200§): WC3: WV08+ CfA3; J99: [jnTeTaD fl999l) ; H00: 
Hernandez et alJ d2000ft : KOO: iKrisciunas et alJ l(2000h: K01: IKrisciunas et alJ ^OOlT): D P02: iDi Faola et alJ lf2002T); V 03: iValentini et alJ p§03f); K03 : 
Krisciunas et al. (2003); K04b: Krisciunas et al. (2004b); K04c: Krisciunas et al. (2004c); K07: Krisciunas et al. (2007); ER.06: Elias-Rosa et al. (2006); 
ER07: lElias-Rosa et alJ 1 120081) ; Pa07: IPastorello et all j2007D ; St07: IStanishev et all 120071) ; P08: IPignata et al.l J2008I) . 



TABLE 5 

Distance Modulus Predictions for SN Ia 



SN 


cz 

[km s _1 ] 


MlcdmI 2 
[mag] 


°>l 2 
[mag] 


Mrcsub 

[mag] 


/^prcd 

[mag] 


^prcd 

[mag] 


^prcd 

[mag] 


^pred 

[mag] 


SN1998bu 


708.90 


29.97 


0.46 


30.00 


29.95 


0.02 


0.10 


0.97 


SN1999cl 


957.00 


30.62 


0.39 


30.94 


30.94 


0.05 


0.12 


1.94 


SN2005el 


4349.10 


33.93 


0.08 


33.89 


33.87 


0.04 


0.11 


0.03 


SN2005eq 


8535.00 


35.42 


0.04 


35.49 


35.49 


0.03 


0.11 


0.22 


SN2006ax 


5391.00 


34.40 


0.06 


34.36 


34.34 


0.02 


0.10 


0.03 



Note. — This table is a representative stub. /^lcdmI^ is the distance modulus expected 
from the redshift assuming h — 0.72. 0.M — 0.27, Ha — 0.73, w — —1. Its magnitude 
variance, assuming peculiar velocity dispersion cr pcc — 150 km s _1 is a^ L . /x rcsu b is the 
distance modulus estimated under rcsubstitution. Under bootstrap cross-validation, the 
mean prediction over bootstraps is /A pre d and the standard deviation of predictions over 
boostraps is s prc d. Zero values of s pre d are less than 0.005 mag. The average standard 
deviation of uncertainty of a predictions is cr prec j . The marginal posterior mode of Ay 
under prediction, averaged over the prediction sets for each SN is A^ ed . 



